Want to know how Apple’s Genius song recommendation system for iTunes works? Apple engineer Erik Goldman offered up some insights to users of answer service Quora in a post back in May. While Goldman’s post has since been deleted, Christopher Mims covered it in an MIT Technology Review story on Wednesday. Goldman’s answer on Quora offered a sneak peak into the way big data analytics and aggregated personal information combine to personalize song recommendations and create a custom long tail of content for iTune’s customers. The Genius services boosts revenue for Apple, but insights into its inner workings could also benefit the web as a whole.
Recommendation engines are the key to shoving the entire web in small devices like mobile phones and for creating a hyperpersonalized surfing experience. For consumers, the web has opened up billions of opportunities to find content , with much of it contained in the so-called long tail made famous by Wired’s Chris Anderson. But mere mortals can’t filter though all the possibilities to discover what the heck they want to read, watch or listen to. Hence the popularity of recommendation engines and discovery services from companies like Amazon (s amzn), Apple (s aapl), Netflix (s nflx) and even Google (s goog).
Despite the fact that Goldman’s original answer mysteriously disappeared the day the Technology Review’s post drew attention to it (if you want to see what may be Goldman’s original answer, check the screenshot at the end of the post), Mims unpacked the mystery of how the recommendation service works. The heart of the Genius recommendation systems are statistics applied against a mess of data. The initial goal is to take an individual’s playlist and measure the frequency of certain elements (such as the artist) and determine how significant that element might be in making a recommendation. To do that, the algorithms check the frequency of those elements in other Genius users’ playlist to see which ones occur widely and which ones don’t. This allows it to compare playlists between people who like the same obscure bands rather than trying to draw conclusions based on the hundreds of millions of playlists that include Lady Gaga’s “Bad Romance.”
The second element of figuring this out relies on assessing which rules the recommendation engine can apply to your playlist to reduce the amount of data it must parse through — the so-called “latent factors.” Christopher Mims writes:
Latent factors are what shakes out when you do a particular kind of statistical analysis, called a factor analysis, on a set of data, looking for the hidden, unseen variables that cause the variation in all the different variables you’re examining. Let’s say that the variability in a dozen different variables turns out to be caused by just four or five “hidden” variables — those are your latent factors. They cause many other variables to move in more or less lock-step.
Discovering the hidden or “latent” factors in your data set is a handy way to reduce the size of the problem that you have to compute, and it works because humans are predictable: people who like Emo music are sad, and sad people also like the soundtracks to movie versions of vampire novels that are about yearning, etc. You might think of it as the mathematical expression of a stereotype — only it works.
These techniques aren’t rocket science — they’re statistics based (which, given my performance in stats as opposed to physics, is much harder). To learn more about how latent factors are uncovered, Goldman recommended folks turn to the site operated by the recent winners of the Netflix recommendation prize. For laypersons, I can recommend Wired’s awesome story covering the race to win the Netflix prize, which shows how most of the people trying to improve recommendation engines are doing so in the open and piggybacking on each others’ efforts — something Apple doesn’t seem to be endorsing, given that Goldman’s post was deleted.
As the devices on which we consume our information become smaller, the need for better recommendations has moved beyond a nicety for discovering long-tail content into a necessity for displaying optimal results quickly over a mobile connection and on a small screen. I discussed this problem with Elizabeth Churchill, principal research scientist and manager of the Internet Experiences Group at Yahoo (s yahoo), a while back, and she emphasized that tailored recommendations are important for mobile users not only because the screen sizes are small, but also because mobile connections are slower and people don’t have the patience to wait for a lot of results to load.
My theory is the ability to use compute clouds, access huge amounts of data, then crunch that data to make prescient recommendations and then deliver them in a format fit for mobile consumption will be the key stepping stones for the next generation of the web.
This article also appeared on BusinessWeek.com