10 ways big data changes everything

Can gigabytes predict the next Lady Gaga?

By Stacey Higginbotham
Want to know how playing on Jimmy Kimmel Live will boost the sales of an artist’s album? Or how about figuring out where fans go to find artists after they hit the evening news? What about the effect Whitney Houston’s death had on her YouTube and Vevo plays? They shot up 4,525 percent, by the way.
If you want to know this and other music industry data gleaned from the Internet, then you want to turn to Next Big Sound, which exists to find the connection between social activity and music sales.
The service, which recently raised $6.5 million, began two years ago because its founders thought the influx of data — from social networks like MySpace and Twitter, online music services such as Rdio, and sales sites — might help them understand how someone transitions from being a member of a band to being a full-fledged rock star.
The site pulls in 5–10 GB per day with peaks of about 100 GB per day on the heaviest days from the usual suspects such as Facebook, Last.fm, Rdio, iTunes and more. Some of this data is structured and accessed via an API — such as data from the music services and sales sites — and thus is easy to deal with. Other data, like that gleaned from blogs, Facebook pages or Twitter, is based on scraping the pages and sites and needs some formatting before the data geeks at Next Big Sound can make sense of it.
Next Big Sound uses Cassandra for larger, time-oriented data sets; MongoDB for medium-sized, semi-structured data sets; MySQL for small, well-structured data; and Apache Hadoop + Pig for offline analytics. Alex White, the CEO and founder of Next Big Sound, didn’t go into more tech specifics, but he did get excited about the new sources of data and how they can change the industry. “We want to unlock the black box of how an artist becomes a star,” White said. “We want to reverse engineer the Billboard charts and understand the key actions and moments that can turn a garage band into superstars.”

Beat of big data

The music industry is ripe for a data infusion. Major labels have their own Hadoop clusters and attempt to track how their artists perform on fan pages and, of course, produce record sales. The process of tracking music sales was done for decades by managers calling up record stores, but in 1991 Nielsen SoundScan entered the scene with accurate CD sales information. Billboard, the industry’s trade magazine, realized that accuracy was the way to go and used the SoundScan data in its charts. The next time Billboard added charts, it was from Next Big Sound.
“Billboard realized it was missing the ways that people were listening to music now,” said White. The music industry had to understand how the explosion of social media affected its core metrics — sales of songs and albums — and nothing was out there. That’s where Next Big Sound comes into play, but fundamentally its goal is larger: It is to learn via scads of data how to make a star.
Next Big Sound has two undisclosed major record labels as customers so far, and it generates two charts for the Billboard Social 50 and the Next Big Sound’s Up and Coming Artist list for Billboard. Last year it also published ”The state of online music in 2011,” an infographic and report chock-full of stats, including that almost 65 billion songs were played across the sites that Next Big Sound tracked in 2011 and that video plays on YouTube peak on Thursday. Also Lady Gaga is big. Everywhere.
But the big money isn’t in trivia; it’s in the insights. And Next Big Sound delivers value above and beyond the record label’s own data-tracking efforts by looking across the entire music industry spectrum to help music professionals allocate their resources around a particular artist. For example, if a manager notices a singer that shares many of the same characteristics as one of her own clients, she might investigate how having a YouTube page has affected that artist. This way the manager can spend her promotional dollars and time more efficiently.
Labels can also run an artist’s songs on YouTube to see which one the label should promote for radio. Tracking YouTube plays or comments made there and on music-focused networks might indicate if the label has a potential hit on its hands. White’s hope is to help provide that comparison.
What Next Big Sound can’t do yet is provide context, however. For example, White says he can show an executive that Chris Brown has 20,000 new likes on Facebook, which might seem good until you realize that is flat compared to all the other Grammy-affiliated artists in the week heading into the awards. But for now, he can’t say why that is. White points out his job isn’t to assess popularity exactly but to say what that popularity means for the record industry.
“We don’t speculate why fans aren’t liking Chris Brown. We want to know what the business impact is. If they aren’t liking, are they still buying?” asks White. That’s what the record industry wants to know, and that’s what the data shows it. For White, this isn’t a social experiment but a hunt for actions that an artist or label can take to produce someone who can sell albums and fill concert halls. For insights into the hows and whys we are drawn to a particular artist, the data and Next Big Sound will stay silent.