If Twitter isn’t the most valuable data source around, it’s at least the most flexible

The amazing thing about Twitter (s twtr) is how fast its day-to-day users became the group that stands to gain the least from the platform. That’s not an indictment of Twitter — its users can still derive a lot of benefit from being able to share and communicate with networks of people far beyond their natural reach — but an assertion about the power of its data. Whatever value Twitter offers on a personal level is dwarfed by the value its immense data footprint to professionals trying to comb it for information.
And the amazing thing about Twitter as a data source is how its value can vary greatly depending on who you ask. For some it’s the real-time nature of the platform. For others, it’s the ability to spot trends or analyze sentiment on certain topics, and for others yet it’s the array of ideas that flow across the platform. Twitter is, put simply, a wealth of information ripe for the picking — if you know how to find the fruit.
Dataminr has become one of the more successful companies to build its business largely around analyzing Twitter data, turning its algorithms loose on the Twitter firehose to highlight key events for traders, public officials and, now, journalists. With that in mind, I’m pleased that Dataminr founder and CEO Ted Bailey will be one of a handful of innovators speaking as part of the Data Lab sessions at our Structure Data conference in March. We’ll have more information on Data Lab and list of final speakers soon, but the nut is that it’s a series of tech talks highlighting techniques and strategies for dealing with new types of data in new ways.
I spoke with Bailey last week after the announcement of Dataminr for News (check out my colleague Mathew Ingram’s coverage here, or watch the video below that was shot at an event with partners CNN and Twitter) and was impressed with how well the company makes use of the metadata associated with each tweet. I was also struck by how it balances the different levels of time sensitivity associated with important breaking news versus an emerging trend and delivers information in accordance with its urgency.
[protected-iframe id=”0d67f158952987df4938ce6263e69941-14960843-6578147″ info=”//player.vimeo.com/video/85445869?title=0&byline=0&portrait=0″ width=”500″ height=”281″ frameborder=”0″ webkitallowfullscreen=”” mozallowfullscreen=”” allowfullscreen=””]
Announcing Dataminr for News from Dataminr on Vimeo.
The first step involves ingesting every tweet in real time and assigning them scores based on things like user reputation, the topics they reference and the type of language they use. Next, Dataminr’s algorithms cluster tweets based on the scores assigned in the first step, as well as by variables such as geolocation, propagation patterns, timeliness and the velocity at which topics are surfacing. This is how Dataminr is able to identify potentially noteworthy happenings.
The algorithms then “systematically compare [each event] to the list of past situations that have happened over the course of Twitter,” Bailey explained. Essentially, they’re trying to figure out how important a particular event, trend or piece of news is by comparing it to how similar news has spread in the past. Finally, Dataminr’s system determines how aggressively to send the alerts (financial services customers, for example, likely got word immediately that BlackBerry was abandoning its CEO search, for example, while smaller items or trends might come in an hourly email) and to whom they should go. It’s all done using machine learning, Bailey said, and it’s all done in seconds. (The company has an infographic that illustrates the whole process.)
News_ColumbiaHowever, Dataminr goes a step further by exposing some of the underlying data to its users to let them decide what to do with the information. They’re seeing the breaking news version of Pandora’s “you’re hearing this because” explanation, complete with maps of where this information is coming from across time and space. During events like a recent shooting in a Maryland mall, local news might want to get someone there immediately, while national outlets might use the early warning to verify the story and track down sources before airing it.
“Twitter is where information breaks today,” Bailey said. “However, Twitter is also the network through which everything propagates and everything spreads.”
It’s also where we we find who’s connected to whom, what people think about the Grammys, what topics really struck a nerve during the State of the Union address and, for brands, how people really feel about new products. All of which is possible when you understand the types of data that Twitter can provide, and none of which is really possible to any given user — no matter how obsessed — sitting in front of Tweetdeck and watching their own streams go by.
Thumbnail image courtesy of Pond5/Cienpies