How Google, Yahoo and Microsoft Think About Real-Time Search

Perhaps inspired by the speed of the medium, the integration of real-time tweets and other updates into major search engines has happened more quickly than I might have expected. It’s pretty amazing that raw Twitter posts already show up by default right on Google (s GOOG) search results pages (they’re a little more buried on Bing (s MSFT) and Yahoo (s YHOO), but still quite prominent and also launched in the last couple of months). Today at the Search Marketing Expo in Santa Clara, Calif., product managers from the three major search engine gave insight into their companies’ approaches to the quickened pace of the web.
BING: Following a hearty endorsement by Microsoft CEO Steve Ballmer in a keynote interview — “I’ve fallen in love with our real-time search; there’s nothing better than our Bing Twitter search” — Sean Suchter, general manager of Microsoft’s Search Technology Center, talked about the value of analyzing not only tweets but links shared on Twitter and Twitter user sentiment about trending topics.
Bing at this point uses only Twitter for real-time search, though it’s supposed to have a deal with Facebook to integrate public status updates. Suchter had no comment as to when that deal would be implemented, but said the Bing team is evaluating how to share the many different ways Facebook users communicate, including giving signals about other people’s relevant real-time updates by saying they like them.

Suchter showed a cool graph (embedded above) of the difference between a network of tweets on an organic topic — the conference we were attending — vs. a spam topic — teeth whitening. It’s pretty easy to see the difference.
Suchter didn’t get very specific about Bing’s real-time special sauce, but he said one of the most interesting ways his team improves Bing real-time search is to look at the past. It takes snapshots of the information available in the world at any one time, evaluates what the biggest thing was, and tries to figure out how Bing could have surfaced that.
YAHOO: Yahoo’s Ivan Davtchev, senior product manager of search, gave a bit more insight into how that company is building a model to determine what tweets are relevant. He said Yahoo emphasizes speed in real-time search but also allowed that freshness is deceptive as a measure of real-time success. Yahoo, which is rolling out real-time search on many of its properties, has built an internal tool called TimeSense (illustrated below) to determine what topics are spiking in real time. It uses language models to group words and then compares them to the body of time before and after. Since real-time updates don’t often include “anchor text” to directly tell search engines what they’re about, promotional factors become more important.
Davtchev spoke more specifically about what Yahoo considers real-time spam: content with “multiple buzzing terms,” overuse of URL-shortening services and overuse of hashtags.
Even though he focused more on the research side of things, Davtchev was the only panelist to get specific about where revenue specific to real-time search might come from. He said he anticipates that one of the most monetizable areas would be local promotions around events.
Davtchev also said to expect Yahoo to use what it learns from real-time relevance on non-search properties — which seems fitting, given Yahoo’s search share and also the fact that so many people use Yahoo as a portal to what’s new on the web.
GOOGLE: Google senior product manager Dylan Casey offered some insight as to how the search giant determines if a real-time update is relevant. “How old was the account, how often do they post, were they often outlinking or inlinking, are they often pointing to the same URL?” He said that Google is trying to emphasize comprehensiveness by including non-Twitter providers such as MySpace and (but Google currently has less access to Facebook updates than Microsoft does, so that might not be his best selling point). However, Google plans to soon publish a standard way to publish directly into the Google indext using pubsubhubbub, he said.
Casey said perhaps the most complex project in real time is to determine when to trigger the appearance of real-time results in search results. “We have huge internal debates on: Is this a good answer to this question, or are we just creating a tool for low-quality content?” he said.
Casey spent some effort justifying Google paying to include Twitter’s real-time firehose of tweets, saying it was an intensive technical integration on both sides, and that tweets are a fundamentally different form of communication due to the restrictions of their form. For example, Google has developed a “complex system” for removing users’ public tweets that are later deleted or marked private.
But even as the giants are sprinting to keep up with real time, they come off as fairly conservative about tweaking their core search experiences. After hearing from the major search players, a mostly emptied-out room was treated to a second edition on the real-time track from four leading startups in the space: OneRiot, CrowdEye, Topsy and Collecta. Seeing things like Topsy’s image search compared to Google’s makes clear the big guys have a lot to learn from the little ones.
Related content from GigaOM Pro (sub req’d):

Report: The Real-Time Enterprise