Why Guavus analyzes lots of telecommunications data before storing it all

It’s not unusual to think that if data scientists want to analyze data, the first step is to collect it and spend a lot of time looking at it — asking questions, refining data sets and then getting some possible answers. But at Guavus, the emphasis is on analyzing petabytes of data as soon as it comes in to deliver real-time results, Anukool Lakhina, the company’s founder and CEO, told attendees at GigaOM’s Structure:Data conference on Thursday.
A decade ago, when Lakhina worked at Sprint (s s) Labs, Sprint employed deep-packet-inspection probes to collect information about how subscribers were using the telecommunications company’s services. It was a good idea — “if we knew how they were interacting, we’d be invisible, we’d know everything about our business,” Lakhina said. But the data couldn’t really be harnessed quickly. FedEx (s fdx) trucks drove around and picked up quickly-filled storage arrays sitting next to the probes around the Sprint network. Engineers jokingly referred to the process as the “package-switch network,” rather than a packet-switch network, Lakhina said. Once the data was collected, researchers reviewed roughly day-old data and matched it with other data. They reported their findings and were roundly turned away, because the data was, well, dated.
Guavus, founded in 2006, automates the FedEx model, so telcos can derive insights from data immediately. Guavus offers its customers customizable dashboards with the self-service simplicity of a consumer application.
The change in thinking from store first to compute first has led to a lot of clear return on investment, at least as Guavus has applied it. One service provider using Guavus discovered that some cab drivers were supposed to be using the network for credit card transactions but were actually carrying live video streams. The use violated the end users’ contract terms and resulted in renegotiations. Another Guavus customer used the product to respond during customer-care calls and explain why end users were getting charged extra for large data use. Data from Guavus can also let customers pass down intelligent information to end users through self-service portals.
These are $100 million problems, Lakhina said. “And you don’t need to do a lot of hunting around to discover these big use cases,” he said.
Check out the rest of our Structure:Data 2013 live coverage here, and a video embed of the session follows below:
[youtube http://www.youtube.com/watch?v=G337lSQu2_g&w=560&h=315]
A transcription of the video follows on the next page