Averages are dumb. That is one of the first things I learned as a data analyst. In fact, they are so dumb they often lead to severe misunderstandings, if not mismanagement.
The average price for this, the average revenue for that, the average score on this test… As a society, overrating the statistical notion of “arithmetic mean” is possibly one of the great crimes we have committed against our ability to understand natural phenomena.
The original intention of introducing the arithmetic mean to business problems, I believe, was to equip analysts with a way to abstract themselves out of the detail of statistical distributions. However, the catch there is that with many statistical distributions that are inherent in natural business problems, the arithmetic mean is quite misleading.
Let’s elaborate. Let’s say you have a set of people that were selected through a physical fitness test, and you’d like to get a sense of their height. You add up their heights in inches, and you divide them by the number of people, and you can now say you have an idea what the average height in the group is. Using the arithmetic mean in this case is relatively valid, as adult humans have mostly similar heights, variance is low and you are comparing apples to apples.
Now, if you have a set of people and you’re looking for the average income: you’re wrong. You can’t take a group and average out their income, without accepting in advance that you will probably lose a lot of the information you were originally looking for, and will end up with a number that will (or rather, should) not mean anything to anyone.
The reason? People’s incomes vary on a much larger scale than their heights. Where I live (Istanbul), minimum wage is around $400 p.m., but it’s not difficult to find someone who makes $4000 in my neighborhood. There are also a group of people around with a $40.000 paycheck. You cannot draw a sample from these people and average out their income. If you were to come up with the number $1000 for example, this would tell nothing to you about the standard of living in your sample: (1) you don’t know how many people are considerably rich, (2) you don’t know how many people make the minimum wage…
Also with such high variance samples, it doesn’t come as a surprise to seasoned data analysts that a young rich pop singer crawls into the sample and doubles your average figure. You’re now looking at $2000 average, only because a variant of Justin Bieber joined your sample with a million-something monthly income. Back to the analogy of apples and oranges: what you’re practically trying to do is to calculate how much the average fruit weighs. By throwing in a couple of coconuts and a watermelon in a pile of cherries, that is.
So, next time you jump at that AVERAGE function in your spreadsheet of choice, take a step back. Chances are, you’re losing a lot of information and calculating a very misleading number. Think about it, it’ll come to you.
Image source: Flickr