Real-time curation in the corporation?

We are experiencing a rapid adoption of stream-based communication tools in the enterprise, and these are introducing a new era of information explosion, one that shares similarities with what has gone on the business setting before, but on a completely unprecedented scale.
The social revolution in the workplace has many ramifications — changes in management approaches, reliance on new technologies, and so on — but from a social network analysis viewpoint we can point at one factor that acts as a proxy for a great deal of the other impacts of social tools: social density. As people adopt and become more habituated to social tools, the number of social relationships increase. And this means more conversations, more connection, and more information streaming through these networks.
All very interesting, but how are we to make sense of dramatically increased information flows?
Bruce Sterling, the science fiction writer, suggested that we’d be relying on algorithmic machinery:

Ultimately no human brain, no planet full of human brains, can possibly catalog the dark, expanding ocean of data we spew. In a future of information auto-organized by folksonomy, we may not even have words for the kinds of sorting that will be going on; like mathematical proofs with 30,000 steps, they may be beyond comprehension. But they’ll enable searches that are vast and eerily powerful. We won’t be surfing with search engines any more. We’ll be trawling with engines of meaning.

And that turns out to be how Google crunches the web. But for real-time sense making  Twitter has found it needs to get people in the loop. After all, when something new begins to become a hot trend, algorithms don’t necessarily have enough context to figure out what is going on, and search won’t do the right thing:

 Edwin Chen and Alpa Jain, Improving Twitter search with real-time human computation via Twitter Engineering Blog
From a search and advertising perspective, however, these sudden events pose several challenges:

  1. The queries people perform have probably never before been seen, so it’s impossible to know without very specific context what they mean. How would you know that #bindersfullofwomen refers to politics, and not office accessories, or that people searching for “horses and bayonets” are interested in the Presidential debates?
  2. Since these spikes in search queries are so short-lived, there’s only a small window of opportunity to learn what they mean.

So an event happens, people instantly come to Twitter to search for the event, and we need to teach our systems what these queries mean as quickly as we can — because in just a few hours, the search spike will be gone.
How do we do this? We’ve built a real-time human computation engine to help us identify search queries as soon as they’re trending, send these queries to real humans to be judged, and then incorporate the human annotations into our back-end models.
Before we delve into the details, here’s an overview of how the system works.

  1. First, we monitor for which search queries are currently popular.
    Behind the scenes: we run a Storm topology that tracks statistics on search queries. [Storm is a distributed system for real-time computation.]
    For example, the query “Big Bird” may suddenly see a spike in searches from the US.
  2. As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query.
    Behind the scenes: when the Storm topology detects that a query has reached sufficient popularity, it connects to a Thrift API that dispatches the query to Amazon’s Mechanical Turk service, and then polls Mechanical Turk for a response.
    For example: as soon as we notice “Big Bird” spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.
  3. Finally, after a response from an evaluator is received, we push the information to our backend systems, so that the next time a user searches for a query, our machine learning models will make use of the additional information. For example, suppose our evaluators tell us that [Big Bird] is related to politics; the next time someone performs this search, we know to surface ads by @barackobama or @mittromney, not ads about Dora the Explorer.

So, they quickly move the sense-making for interest in Big Bird or Clint Eastwood to humans via Mechanical Turk, to overcompensate for recent shifts in the reason for interest in a trending topic.
I think this sort of human curation will be increasingly important in the business context. For example, if a competitor’s product XJ11 is being mentioned all over the social network within your company, understanding why may be critical. Perhaps they have just announced a big sale, or are selling the product line off.  But the new spike in activity is likely to be related to very recent events, and not the long tail of older piece of information about that product. And the most likely candidates to help make sense of the new trend? The people originating the cascading comments in the social communications tools.
Work media and social marketing tools vendors will have to support social curation and getting people into the loop. Twitter approach is one way to go, although instead of using Mechanical Turk a system that asks the originators of these trends for clarification, it might just message the earliest trend setters to ask them what’s up. Alternatively, an approach like Tumblr’s editorial teams might be employed, where individuals are selected to act as editors or curators, and to actively pull out information that is novel and important, and post it onto topic pages.
Whichever approach is taken, I have no doubt that we will see social curation emerging as a key component of our business social networking tools in the near future. And we will still be relying on human beings as the key element of sense making, not machinery.

As the firehose matures, Twitter tightens grip on valuable asset

PeopleBrowsr, a company that provides marketing analytics based on the full stream of data from Twitter called the firehose, is suing Twitter for access to that stream. While Twitter is closing down who has access to the firehose, it shows where the company is headed.

Today in Social

Rounding up the post mortems on Twitter’s announcements. Colleen Taylor says Twitter’s photo sharing service will bring new life to Photobucket, the company that’s powering it. The product demo shows some nice search and hashtag integration for discovery, but no evidence of advertising opportunities for Twitter. And it’s aimed less at collections and more at real-time photos – there were 2 million photo links in tweets on May 30 – and thus, at Twitpic and Yfrog, rather than Facebook or Instagram. Matthew Ingram thinks Twitter’s search improvements still have a ways to go. You still can’t search an archive older than a week, and it’s not very clear how Twitter’s personalization and relevancy ranking works. Meanwhile Darrell Etherington scoffs at some who think Apple would build a Twitter competitor. I don’t see how Apple would monetize such a thing any better than Twitter does, and if it wanted to increase habitual usage of its hardware, it should just integrate Twitter more deeply into the iPhone. Which it may be doing.

Real-time Search Better for News Than Products

An eye-tracking report from OneUpWeb rightfully compares the challenge of real-time search to Heisenberg’s Uncertainty Principle. But it did find that users are already responding to real-time results, especially when they’re seeking out news. Those seeking products clicked less and found real-time results less useful.

How Google, Yahoo and Microsoft Think About Real-Time Search

It’s pretty amazing that raw Twitter posts already show up by default right on Google search results pages. Today at the Search Marketing Expo, project managers from the three major search engine gave insight into their companies’ approaches to the quickened pace of the web.

Twitvid.com Launches Twitter Video Search

Twitter video service Twitvid.com today launched a real-time search engine for videos shared on Twitter. Twitvid not only tracks videos shared through its own service, but any YouTube link shared on Twitter.

YouTube Comment Search Battle: “Sucks” Edges Out “Rocks”

YouTube is playing around with real-time comment search in the vein of web darling Twitter. The new YouTube “test tube” feature provides a continuously updating list of current comments on the site and surfaces popular overlaps of conversation as trending topics.
ytrocks
Being that YouTube is such a popular site, it’s quite possible that a live-updated list of trending topics (a sampler: “flashing lights,” “danny gokey it’s only,” “kitten” and “lloyd doggett”) could give us a sense of the cultural zeitgeist. But YouTube comments can’t escape their reputation for being particularly silly and nonsensical (case in point: this excellent McSweeney’s feature: “YouTube comment or e.e. cummings?“).
ytsucks
So — not to be snarky, I swear! — on Friday I ran a little test, searching YouTube comments for real-time mentions of the terms “sucks” and “rocks.” I actually had to do this multiple times, as it turns out YouTube stops counting after it crosses 100 new comments. But in almost exactly an hour, I was able to get a very unscientific window into the sentiment of the general YouTube population.
1:44 p.m.: Started near-simultaneous searches for “sucks” and “rocks”
1:54 p.m.: 10 minutes in, “rocks” has 15 fresh entries and “sucks” has 21.
2:14 p.m.: The split is widening; 41 “rocks” to 59 “sucks.”
2:34 p.m.: 65 “rocks,” 87 “sucks”
2:44 p.m.: Sucks has crossed 100, and now says “More than 101 comments since you started searching.” Rocks” is dragging with just 74.