The Numbers Game
On precision and accuracy…
As some of you may know, I’ve blogged a lot recently about the COVID-19 data. I wrote five pieces about the inaccuracies in the data collection and interpretation processes and then followed them up with a few articles interpreting the data at a given point in time. Someone asked me recently how I could feel comfortable interpreting data that I know it isn’t completely accurate. That’s a legit question with significant business implications, so I thought it might be useful to share a few thoughts. Here are a few things to keep in mind:
Clarify what decision you are trying to make.
Does your data need to be perfect? It usually isn’t, but the good news is that perfection isn’t typically necessary. The answer to the question of how accurate data needs to be is simple: it needs to be accurate enough to enable you to make good decisions.
An illustration I often use is that if I’m going outside and need to know how to dress, do I need to know whether its 85 or 86 degrees outside… or is it enough to know that its “over 80”… or is it enough to simply know “its hot”? I’m not going to dress differently for 85 degrees versus 86, so in this situation its good enough to know “its hot.” That has two important implications:
- There’s no need to overcomplicate the data collection process to get more precision than necessary.
- There’s no need to fret about the data not being “perfect” to the nth degree… because it doesn’t need to be.
There are of course situations where greater precision is necessary. In the movie Speed, a group of people were on a bus that had a bomb on it. The bus was going 60 MPH, and the bomb was rigged to go off if the speed dropped below 50 MPH. In that case, the difference between 49 and 50 was significant enough to warrant more precision! The point is that you need to think about the precision needed front. If you wouldn’t make a different business decision if the data shows “3” versus “3.1415”, then make the data collection easy on yourself.
Understand the difference between imperfect data and useless data.
The COVID data is actually a great illustration of this. According to Worldometer, as of this moment there are 16,498,383 COVID cases globally, and 649,899 COVID-related deaths. I’d wager that neither of those figures are correct. Everyone has heard stories about improperly recorded data, and I don’t doubt it at all. That makes the data imperfect, but it doesn’t necessarily make it useless. I ask a few questions to determine what can be gleaned from imperfect data:
Have the same imperfections been affecting the data over the course of the collection period, or has something new happened to change the rate of imperfection? This would affect my interpretation – if something new has happened then comparing info before and after the event is apples and oranges.
Also, going back to point one – are we attempting to determine whether the virus is “dangerous” or do we need to know precisely how many cases/deaths? If the standard for “dangerous” is >1,000,000 cases and >100,000 deaths, then I’m quite comfortable crawling out on that limb to make the claim based on the data we do have.
Moral of the story:
Don’t ask if your data is perfect. Ask if it’s close enough to tell you what you need to know.