Entrepreneurs from six big data startups took the stage Wednesday at GigaOM’s Structure:Data conference to share insights on the industry as a whole. Taken together, one gets a sense of the ideal way to crunch big data in an enterprise or any other organization with large data sets on their hands.
- Just because you have a lot of data doesn’t mean you’re doing a good job of acting on it. Numenta CEO Rami Branitzky made the point with an example. Data scientists working at utility companies might act on just 0.5 percent of data, and it might take them three weeks to build a model, let alone deploy it. A better solution, Branitzky said, would derive insights immediately as fast as data streams come in, just as the brain processes information pretty much as soon as a person captures it through the five senses.
- Sure, Hadoop is hip and hefty — just ask my colleague Derrick Harris, who recently wrapped up a four-part series on it — but it ain’t necessarily easy for statistics-savvy data scientists familiar with quick and dirty programming languages such as Python to wrangle data with Hadoop in Java, said Doug Daniels, chief technology officer of Mortar Data. Hence the company’s offering of Hadoop available for deployment through Python, which could make more sense for certain customers.
- Airlines have modernized pilot dashboards over the years, although multiple iterations haven’t necessarily added more new measurements for pilots to keep track of, said Stephen Messer, co-founder and vice chairman of Collective[i]. Instead, the companies put right in front of pilots’ eyes the information most relevant to them at any given moment. “Is this the best technology out there? No. It’s taking existing technology and reutilizing it,” Messer said. Similarly, his company seeks to give customers existing technology that’s easily accessible and therefore very powerful.
- Asking questions of your data is only effective if you know the right questions to ask. But what if you don’t? Arijit Sengupta, CEO of BeyondCore, showed off his company’s answer to that question — software that quickly computes thousands of options based on all available variables to show charts and actually talks to you to identify the biggest drivers of, say, profit.
- The number of “open-data APIs” that can provide data freely to the public has grown in the past five or six years from fewer than 100 to more than 8,000, said Sharmila Shahani-Mulligan, founder and CEO of ClearStory Data. Companies should be able to take advantage of all that publicly available sets by easily crossing it with privately held data to draw new insights, she said.
- As Ayasdi Co-founder and CEO Gurjeet Singh sees it, the popular word “insight” should have a commonly accepted definition. He proposed one: an actionable truth about a problem discovered from data. By “actionable,” he meant that it should be compact, because otherwise it’s unlikely that anyone will act on it. Regarding “truth,” it can’t be random. “In large data sets, it’s easy to find whatever you want to find.” There must be statistical proof bearing out a theory. And it must be “discovered” as a result of a customer’s questions.
Check out the rest of our Structure:Data 2013 coverage here, and a video embed of the session follows below: