Beware of the giraffes in your data

Marketers and analysts are always on the lookout for exciting new insights which can translate into action items and provide strategic advantage, but they often miss them. They can even make the wrong decisions – because they fail to account for the “giraffe effect” in their data.
Giraffes are what I call portions of data which dominate the rest of the data – and hide important insights. Sometimes they even lead to wrong conclusions. For example a gaming company client looking for the highest value customers thought the data said it should market to men, when women spent twice as much as those with a Y chromosome. How could the data lie?
The truth is, it didn’t. The company was just distracted by a giraffe.

The giraffe, the fox, the cat and the mouse

Let’s say you’re out watching animals in a nature reserve. Undoubtedly, when you spot a majestic giraffe in your binoculars, you’re going to take a good look at him. Meanwhile, many of the other, smaller animals will all just seem, well, small. You won’t notice that there are significant differences in height among the smaller animals, especially as compared to the giraffe.
However, if you can take your eyes off the giraffe for a minute and zoom your binoculars into the smaller animals on the plain, an amazing thing happens: you become aware that the differences in size between the animals are actually much larger than you had first realized.
This is a very simple example of the giraffe effect. When people look at a set of data which includes some very large, dominant members, important differences among the other data in the set often disappear from view.

A website analytics example

If you’re not already familiar with the following images, they are known as “heatmaps.” They reveal the areas of most intensive visitor mouse movements and mouse clicks on a webpage. Red areas indicate the most mouse activity, blue with the least.
In this first heatmap, we see only one dark red area, namely the login password field. Because many of the visitors to this page are already registered users of the site, it makes sense that such a large percentage of mouse activity is centered on the login area. However, because all the mouse activity data is aggregated here, important information about where non-registered visitors are looking and clicking is hidden from the analyst’s view.
Once the analyst drills down and removes the giraffe from the data (the registered users), he sees a view of the data that is much more revealing as to the visitors’ areas of interest. Specifically, we see in the following heatmap a dozen red areas instead of only one. By separating out just one portion of the data (the registered users), the analyst uncovers the important information that will lead him to better decisions about how to improve the website.

A customer analytics example

But what about that gaming company I mentioned earlier? Since many of our clients are gaming firms, we come across plenty of examples in this field where giraffes in the data can lead to poor decision making. And it’s not always easy to even know that a giraffe is lurking in your data, leading you astray.
Marketers for a particular company wanted to improve their customer acquisition efforts by focusing on the most lucrative customer segments. Naturally, one of the dimensions they considered was the gender of their players. A top-level aggregation of their data clearly showed that male players had a 39 percent higher customer lifetime value (LTV) than female players (the data has been simplified for the sake of this article):
The obvious conclusion of this analysis is to focus more resources on acquiring male players than on female players. This, however, would be a mistake because actually, female players have a higher LTV in every country! This is obvious when looking at the numbers sliced by country, where female LTV is double that of male LTV in every country:
More than simply hiding insights, this aggregation actually led to an incorrect conclusion. How can this be? This situation exists because of two factors: the large discrepancy in the number of male/female players in the different countries and the large discrepancy in LTV from country to country. The following table shows the gender breakdown by country (the percentage figures refer to the distribution of customers in each row).
In this case, the UK represents a huge giraffe lurking in the data – the much larger LTV of this country’s players combined with the reverse proportion of male/female players (as compared with RU and US). By drilling down a bit and looking at each country individually, the marketers were able to discover the ideal course of action.
While this kind of situation is admittedly unusual, it is an excellent demonstration of a hard-to-spot giraffe in the data. By the way, the paradoxical situation in which a reverse trend appears in aggregated data, as in this example, is known as Simpson’s paradox.

Spotting giraffes

There are often giraffes in your data hiding important insights. They can even lead to erroneous decision making. The handful of examples here are only the tip of the iceberg; there are many more ways that aggregated data can hide insights and mislead marketers and analysts. Other common examples of giraffes that immediately come to mind are:

  • Understand the true effectiveness of your SEO efforts by eliminating all traffic due to searches which include your brand names.
  • Make sure that data on the majority of e-commerce customers – one-time purchasers – is not concealing important insights regarding the more valuable – repeat – customers.
  • Make sure that data on the 40 percent of iGaming players who churn after their first 24 hours is not leading you to incorrect conclusions about where the most valuable players are acquired.

Discovering if there are any giraffes in your data is sometimes easy – an obviously dominant value will be like a huge giraffe eye staring you in the face. In these cases, it’s important not to ignore it. If you don’t see an obvious giraffe at the aggregated level, it’s important to look for one by slicing the data looking for dominant values. The most common way to do this is by adding an additional dimension or two.
In short, I strongly encourage marketers and analysts to dig down into their data, to look out for misleading dominant portions of the data, and not to rely only on high-level, aggregated views. Beware of the giraffes in your data!
Pini Yakuel is a data expert and the founder and CEO of Optimove, a retention automation platform.