An API invasion in streaming music has Pandora hoping better data trumps bigger data

Life is probably going to get a lot harder for streaming music pioneer Pandora.

Already under fire from Spotify Radio, iTunes Radio and Google Play Music among myriad other smaller services, the last month has brought new API-based offerings from metadata specialist Gracenote and music-data startup Senzari that aim to let anybody with a music app power it with personalization. A crowded space could be getting a whole lot more crowded.

On the surface, it appears that Pandora, despite its head start of at least a decade on most of its competitors, is facing an uphill battle. It’s not just the number of services that are potentially coming online but also the number of songs and the amount of data those services are offering up to consumers. The Echo Nest, which powers personalization for Spotify (among others), boasts data on more than 35 million songs (which is why Spotify’s library is able to hold more than 20 million songs). Upstart Senzari says its new MusicGraph platform contains around 20 million songs. Gracenote has data on more than 130 million.

Pandora, by contrast, has “more than 1 million” songs in its library. If consumers want unlimited choice in the music they can listen to, they probably don’t want Pandora. And the company is fine by that — or so it says. “I agree that more data is better, but that’s different than more content is better,” Pandora chief scientist and vice president of playlists Eric Bieschke told me in a recent interview (before Gracenote announced its Rhythm data platform, I should note).

That’s a pretty bold statement, but it sums up Pandora’s worldview in terms of where it thinks it stands in the streaming market. It also follows an increasingly popular story we’re hearing (one we’ll delve into at our Structure Data conference in March) now that the big part of big data is being exposed as overblown in some circles. It’s important to have big data, but better to have the right data; it’s important to have a lot of songs, but better to have the right ones.

Maybe it’s a wise strategy, or maybe it’s just a legacy mindset from Pandora’s early, low-tech days of hand-coding the musical landscape. It seems like we’ll find out soon enough.

Pandora doesn't let you search limitless songs, by design.

Pandora doesn’t let you search limitless songs, by design.

The radio versus the record store

Pandora’s business model is to act more like a radio station, where listeners hear a (hopefully) wide variety of good songs, and less like a record store (remember those?!), where consumers might come to peruse through thousands of albums to discover some new bands. If you’re offering tens of millions of tracks, Bieschke explained, they’ll likely include all sorts of duplicate, karaoke and other random tracks, “which for a radio experience like Pandora would be a problem.” The company has access to all the songs legally available in the United States, but it’s better to have the tracks people really want to hear so they can spend their time listening (and occasionally giving a thumbs-up or thumbs-down) rather skipping songs or searching for the version they really want, he said.

Even The Echo Nest CEO Jim Lucchese might acknowledge Pandora has a point. During an interview last May, he noted that his company is seeing a “huge and growing problem” with what he calls music spam. Some spam songs and albums are compilations, some are covers (and not by famous musicians, but by bands like The Hit Crew), and others are just “crazy” stuff that share keywords or attributes but that nobody really wants to hear. Regardless, it’s all junk to people just trying to find the original version of “Born in the U.S.A.,” for example.


A search for “Born in the USA” in Spotify.

And then, Bieschke said, there’s the fact that most modern-day music listening is focused around a relatively small number of songs, even if “not a lot of people want to admit it.”

“People want to think their tastes are so unique that a song should be made just for them,” he explained, but the reality is that most people will only hear tens of thousands of tracks over their whole life (and they probably won’t like all of them).

“[Tens of millions of songs is] more than you ever really want to hear,” Bieschke said. “… If we we’re a song on demand service, it would make a lot more sense to have more tracks.”

But size does matter

Still, it’s hard to discount consumers’ appetites for quantity. Pandora, which has been around since 2000, says it has more than 76 million active users and 3 million paying subscribers. Spotify, which has been around since 2008 and only launched in the United States in 2011, already claims more than 24 million total users and more than 6 million paying subscribers. Arguably, much of that has to do with the size of its catalog and the ability to search for — and find — particular music, but Spotify’s radio service certainly accounts for some of the interest.

But what if services like Spotify, even with their tens of millions of songs, actually can deliver a personalized music stream just as well as Pandora and cut through all the spam and bad songs? Or what if a company wanted to copy Pandora’s model of offering only what it thinks are the best 2 million songs ever? What if 10 different services popped up focused on high-quality streaming radio for specific music genres? (I’d sign up for a really good heavy metal service in an instant, if anyone thinking about starting one is reading this.)

Data platforms like The Echo Nest, as well as startups like Senzari and newcomers (to this space, at least) like Gracenote, theoretically make any of these scenarios possible. They don’t offer up any streaming music themselves, but they do provide intelligence on (literally, almost) whatever content users of their APIs want to offer. The Echo Nest already handles personalization and other capabilities for Spotify, as well as services for companies including Clear Channel, Univision and SiriusXM.

“When you say, ‘Play me a radio station based on Jack White,’ we’re then engine that’s powering that stuff,” The Echo Nest CEO Lucchese said.


The Echo Nest is collecting lots of data every day.

And their platforms are pretty smart. Senzari COO Demian Bellumio recently explained to me just how deep his company’s MusicGraph service goes to draw semantic connections among the various data points surrounding every song. It uses machine listening algorithms to extract about 4,000 features and 200 megabytes of data per song on things such as chord progression and tempo. The company also analyzes lyrical content, metadata around producers, songwriters and artists, and even readership on partner websites (many of which pay for Senzari’s flagship WahWah music service).

With MusicGraph, Bellumio said, “all this information is now available in one place. … [It’s] pretty much everything you could ever need.” Access to the data starts at $499 a month, which he calls “almost a pass-through of our [Amazon Web Services] bill.”

As of May, The Echo Nest, which was co-founded by MIT-trained machine-listening experts Tristan Jehan and Brian Whitman, was indexing about 10 million documents (e.g., blogs, articles and reviews) a day relating to music. This is on top of the couple of million songs it’s able to analyze per week using its machine listening system. Lucchese said the company was researching all sorts of new capabilities, such as mechanically determining valence (i.e., the mood of a song) and the regional differences in how people classify or access music.

“We’re gonna look back at this time and see that we’re just getting out of the stone age in terms of seeing who you are as a music fan and applying it,” Lucchese said.

Beyond the web, and beyond personalization

The data platforms also provide would-be Pandora competitors an advantage because they can take personalization beyond the digital realm and into a listener’s hard drive, if services offer users the option to upload the contents (or the metadata) of their music folders. This is especially true for Gracenote, which also does some machine listening and other cutting-edge analysis but really specializes in metadata. The company has data on so many songs — more than 130 million — because it has been gathering it for years every time iTunes users or other jukebox users around the world upload obscure tracks to their computers.

This is a powerful proposition in a few regards. One is the ability to provide personalized playlists even when there’s no internet connection. Another is the possibility of making algorithms smarter right away by knowing what listeners already like. For Gracenote, this is where scale really comes into play: The chances are that if a user has it in his collection, Gracenote has metadata on it and can use it to recommend similar songs.

Google and Apple haven’t released a lot of details about how big the catalogs are for Play Music All Access or iTunes Radio, but they also both have access to users’ MP3 collections. (They also have their own scale advantages in terms of brand mindshare, installed user bases and device platforms on which to embed their services.)

And then there are options that the growing collection of music data platforms provide to developers beyond just personalization. All the general music-industry and social data that platforms like The Echo Nest and Senzari gather can be used to create music experiences that go beyond just listening. Music Graph, for example, is available in the Firefox OS app store as a search app that lets users search for music the same way they might search for friends or business using Facebook’s Graph Search tool.

A screenshot of the Music Graph app.

A screenshot of the Music Graph app.

More obvious applications might be giving users of streaming services access to more information about the music they’re hearing or, like The Echo Nest does, giving service providers tools to segment their listeners so they can sell space for targeted ads.

Pandora’s different kind of big data

However, it might be premature to assume that services powered by the scale and sophistication of these data platforms will outdo Pandora, especially if there are enough users who really just want a streaming radio service that doesn’t pick bad songs. After all, Pandora’s Bieschke explained, even though his company’s song catalog is much smaller, it does contain a lot of data on those songs.

He would even argue that it’s better data than what competitors and data platforms are generating, in large part because Pandora still relies pretty heavily — almost certainly more so than its competitors — on human judgment. The company maintains a staff of 25 “musicologists” who spend all day listening to new music and mapping it characteristic by characteristic as part of the Music Genome Project. It also employs curators whose job is to listen to new music and find the best stuff to include as part of the streaming service.

Over the years, Pandora and its recommendation algorithms have actually evolved quite a bit. For the first several years of its existence, the company was focused solely on the Music Genome Project, which was the engine that powered recommendations when Pandora first began doing streaming radio in 2005. Around 2007, the company first realized it was sitting on a valuable collection of data about how individual users listened to music.

“It turns out that 35 billion thumbs is a gold mine of data about people’s personal music preferences,” Bieschke said.


All of this data has been augmented by machine learning and, in some instances, machine listening algorithms to help improve personalization. Initially, they were largely trained using data from tracks that humans had analyzed so that the machine could mimic human experts when it came to categorizing music. Around 2010, he said, Pandora began using “ensemble-style” algorithms that use a master algorithm to tie together multiple individual algorithms about listeners’ preferences and behavior. For an in-depth take on Pandora’s techniques, check out this August profile from Fast Company Labs.

Benchmarking song selections is a subjective affair, but Pandora’s approach seems to be working fine for now.

The future: Enabling artists and understanding context

Bieschke thinks Pandora has some other advantages that will help it remain relevant as competition picks up, including its relationships with the artists whose music is plays. Eventually, he said, all the online music services will probably be pushing data back to the artists in order to prove streaming music has value beyond just royalty payments. In 2012, during one of many artist backlashes against low royalty rates over the past few years, popular cellist Zoë Keating suggested that data about listeners might actually be more valuable than royalties if it helps them sell more concert tickets or merchandise.

Bieschke said Pandora already meets with some artists (including, he thinks, Keating) to share the data it has collected about them, including which parts of the country they’re popular in and how listeners rate their music. Right now this is usually an in-person meeting (“It is enough data where you need a little bit of translation,” Bieschke said) but Pandora is looking for ways to automate the process.

If the medium is ever going to get the respect of traditional radio, he said, “We need to publish this data back publicly so people can see the promotional effect of what we’re doing.”

How artists really make money: concerts and merchandise. Source: Flickr/aarontait

How artists really make money: concerts and merchandise. Source: Flickr/aarontait

Pandora is also looking toward context-aware recommendations as part of the future of streaming radio. Nearly everywhere people go, Bieschke noted, someone is playing some sort of streaming music, which opens the doors for a passive type of interaction that permeates listeners’ lives. As streaming music services make their way into more devices, cars and other platforms, the more data that accumulates in order to understand the relationships between people, places and times.

“The act of me walking into the bar will adjust the music to what I like,” he contemplated, or perhaps a service could recognize when listeners are with their children and play family-friendly music. Gracenote is already working on this to a degree by trying to play relevant music based on the sensor data of cars in which it’s installed.

Bieschke acknowledged that social media data, such as Foursquare check-ins perhaps, will likely have to help feed this type of contextual experience, but he also said he’s not always too excited about social data generally. That’s because of one big factor that influences the art of music personalization more than many people would like to admit: embarrassment. In America, especially, people want to keep their guilty pleasures private.

Pandora once integrated with Facebook, Bieshcke said: “Overnight, everyone deleted their New Kids on the Block, Backstreet Boys and Britney Spears stations.”

Feature image courtesy of Shutterstock user agsandrew.