top of page

Let's Fix How We Talk About Data


I recently had a discussion with a friend and baseball fan about the value of WAR (Wins Above Replacement). WAR is a measure of the value of a baseball player, incorporating offensive and defensive performance. WAR attempts to answer the question "How many wins is a player worth, over an available replacement (e.g. bench player or minor league player)?"

Our conversation started with a text that compared the WAR of 2 players. In his opinion, the player with the lower WAR was unquestionably better than the player with the higher WAR. He suggested the need for a "fact-checker" to find errors in the formula. Because, in his view, the measure failed to correctly rank these two players, he considers WAR to be fatally flawed. In fairness to my friend, he was initially set off by an ESPN host using WAR as his primary argument for Adam Dunn as a Hall of Famer (he is not).

What's WAR got to do with it?

I share this story, not as an argument for WAR or against Adam Dunn, but because it illustrates the 2 big problems with the way we talk about data. One side dismisses data out of hand, and the other side bestows data with more power than it actually has. One chooses to know nothing (beyond what he sees/feels) and the other thinks he knows everything. Both problems stem from a lack of understanding of how various statistics should and should not be interpreted and applied. These 2 problems are pervasive in sports, political analysis, and business -- and in the end, make data less useful.

So let's make a few things clear.

Data can, and should be used to reduce the amount of uncertainty about something that matters. This is the whole point of research. Whether you care about how valuable a baseball player is to his team, the likelihood of a customer making a purchase on your website, or the probability of a person dying from a particular disease, data helps us reduce uncertainty.

Models that estimate value, forecast events, predict behavior, etc., are useful in improving our ability to better understand things and make better decisions. Using WAR (the baseball statistic) as an example, tells us more about a player’s value than we would otherwise know from looking at a box score or watching a game. Assembling a team based on WAR rather than (or in addition to) single measures like home runs, batting average, strikeouts, ERA, etc. will most likely result in a better team.

Data cannot, and should not be assumed to eliminate ALL uncertainty about anything. If this is your expectation of a statistic, forecast, predictive algorithm, etc., you are giving too much power to data analytics. Even the most precise measurements are not perfect.

"If a man tells you he knows a thing exactly, then you can be safe in inferring that you are speaking to an inexact man." - Bertrand Russell (1873-1970), British mathematician and philosopher.

​No model can tell you with absolute certainty that Customer A will buy Product B. Likewise, WAR should not be used to say that Player A is definitively better than Player B. Given this limitation however, it still better to know more, right? Just because we cannot eliminate ALL uncertainty, does not mean we should not eliminate as much as we can.

"Ignorance is never better than knowledge" - Enrico Fermi, Nobel Prize, Physics, 1938

Changing the conversation.

In order to increase the value that data analytics can add to our teams, companies, society, etc., we need to fix the way we talk about it. This starts with increasing data literacy. This is not to say that everyone needs to be able to articulate what a p-value is. We can start by appropriately calibrating expectations in terms of both the power and limitations of data analytics. Evangelically touting a model or measure as THE answer will only increase skepticism. At the same time, dismissing a model out of hand because it has limitations is missing the forest for the trees.

Data analytics has limitations. Data analytics adds value. Both are true. Once both sides of the discussion have a shared understanding of these 2 simple, but critical truths, the conversation can advance from should we use analytics, to the more important and productive conversations about how we use analytics to add clarity to the things that are important to us.

bottom of page