Twitter now indexes every tweet ever

Twitter has built a new search index that allows users to surface all public tweets since the service launched in 2006. At nearly half a trillion documents and a scale of 100 times Twitter’s standard real-time index, it’s an impressive feat of engineering.

Twitter partners with IBM to get its data into the enterprise

Twitter and IBM have partnered on a deal that will integrate Twitter data into various IBM software products and cloud services, and will result in a new certification for 10,000 IBM consultants. It’s a good deal for IBM, but probably better for Twitter.

Can a market where consumers sell their data actually work?

Two news stories from Wednesday — one about a startup trying to play data broker between user and website and another about a study into what people would charge for their personal data — offer more evidence that there’s an appetite for a market where consumers sell their data to advertisers and website. The idea isn’t new (we wrote about its traction back in 2012) and actually has merit because it puts money in consumers’ pockets and higher-quality data in advertisers’ databases. But monetizing the idea might be easier said than done: Enliken, one of the startups we covered in that 2012 piece, appears to have closed its doors.

Twitter visualizes a World Cup shootout in tweets

Twitter has released an analysis of activity on the social network during the overtime shootout period in last week’s World Cup match between Brazil and Chile. The pattern, which Twitter claims has repeated itself through every overtime shootout, is pretty interesting: people tweet like crazy leading up to the kick, watch intently (and with hands off keyboards) as the player gets ready and finally kicks, and then tweet like crazy again after the kick scores or misses. Seeing this phenomenon visualized is a small window into the relationships between our eyes, fingers, televisions and computer screens during big events.

In the rush to hype AI, it’s easy to forget about web search

Microsoft keeps rolling out new features in Bing that it claims make it superior to, or at least more interesting, than Google’s dominant search engine. They’re not the sexiest applications of artificial intelligence, but they are easy, practical and tied to big money.

Repeat after me: ‘Google is not a proxy for big data’

Another study is reporting on the inaccuracy of Google Flu Trends project, which predicts seasonal flu rates based on search data. However, Google’s algorithms don’t constitute the “big data” approach to this issue, they’re just one piece of a smart big data approach.

On the importance of direct customer interaction

Direct interaction with customers is key to the understanding and responsiveness required for a company to better serve them. That may seem obvious, but it is easy for a business of almost any scale to outgrow the forced contract that assures that leadership in all departments benefits from such interaction. Directly-gained insight from customers is vital not only to sales and marketing, but also to R&D, engineering, distribution, and, with more IT reaching customers and suppliers, IT leadership as well. Social data can’t replace the direct communication with customers–but it can, increasingly, augment it.

Gigaom Research social curator Stowe Boyd has an interesting post, Shaping companies to get closer to customers, that touches on some of the research on the value of customer-signal information and its implications for the social-generation enterprise.



DataSift introduces VEDO, a means to structure and analyze social data

I got a heads-up from Nick Halstead, the founder and CEO of DataSift, the social data company, about some news that’s going live today.

Nick frames the problem confronting companies that want to dig into the rich possibilities latent in social data (from the company blog):

Social Data is very much a Big Data problem – the data generated each day is beyond the reach of 99.99% of businesses and to store even a fraction of it is a challenge. Second the content itself is in the most part unstructured. If you look at a Tweet – there is almost nothing you can do to it in a purely analytical sense other than count it. For two years I have been wrestling with how we could make both understanding the data simple and to bring it into context for business.

DataSift has created very sophisticated means to pull in, analyze, and quantify social data from sources like Twitter, but providing tools that non-programmers can use has been challenging.

Enter VEDO

So today we are announcing VEDO – an extension of our core platform that brings programmable intelligence to the masses. Building upon our incredibly rich text pre-processing and parsing capabilities, we have added a whole new engine that allows customers to take advantage of advances in machine learning, statistical models, rich taxonomies and much more all through a simple and unified approach. As with the rest of our platform, we want to reduce the cost of developing this kind of functionality for our customers and let them focus on innovation and not on infrastructure.

VEDO brings the power to understand the context and the meaning of the content itself. It can be trained to understand any subject and to contextualize it so that the data can be inherently joined to other structured data within the business. This to me goes to the heart of the value of Social – bringing it together with other business data to set it in context and allow customers to understand why and how Social is impacting them and be able to make decisions off the back of it.

Consider applying machine learning to unstructured enterprise social data as the backbone of a new generation of work management tools: the algorithmic ‘engines of meaning’ might augment or replace the ‘collaboration’ architecture of human-defined project spaces, access controls, and org chart-based sharing schemes, relying instead on intelligent agents that automatically and on-the-fly manipulate our online workspaces and pull information from whatever sources that may be relevant to the task at hand, for everyone.

The third way of work makes new technologies core to the conduct of business: we are moving to a model where — for the first time, really — our tools will no longer emulate pre-computer era ways of working, but will break free of those restrictions.

I hope to interview Nick in the next few weeks, and learn more about how DataSift’s clients are using the tool, and what’s on the horizon for social data analysis.