Powerset, which implements semantic search, recently released a public beta based on the limited data set of Wikipedia. But while there is no question that Powerset has some interesting and valuable semantic search technology — many of their demo queries produce meaningful summary pages and reference pages with information extracted from Wikipedia content — there are other semantic search engines that produce equally meaningful and relevant results.
In this post, we compare Powerset results with those of a demo implementation from one such search engine, Cognition Technologies. And we compare them both with the current gold standard in web search, Google (again, limited to the Wikipedia data set).
Example 1: Powerset
There are some classes of queries in which Powerset shines, such as whenever the query involves extracting concepts or aggregation of data from a given data set.
For example, check out the beautifully presented results for the following queries that extract key information the user is looking for and provide it in summary format:
“teams in the NFL”
Example 2: Cognition Technologies
On the other hand, there are other types of queries — especially where hardcore semantic parsing is involved — where the Powerset algorithms get confused, and Cognition gives better results:
“rare wildlife of the Amazon”
“football players who went to jail”
Example 3: Google
There are still queries (especially when semantic parsing is not involved) in which Google results are much better than either Powerset or Cognition:
“helicopter carrier Iwo Jima class”
Here, surprisingly, Google has the best results. Powerset has related results, Cognition gets totally confused, but Google nails it!
One area where both Powerset and Cognition improve on Google is the disambiguation of query terms. This is always a significant issue for search engines; for example, when a user types in the keyword Java, does she mean the island, the programming language, or the coffee?
Google has recently tried some experiments in this area, but these new search engines go one better.
When Powerset sees an ambiguous topic, it uses tabs to provide both sets of results:
Cognition handles it in a different way, by letting the user select from among different semantic meanings for each term:
For most common searches, Google search works just fine. We’ve all gotten used to the ubiquitous “keyword-ese,” currently the universal language of web search. With Google’s unlimited resources, comprehensive index and formidable prowess in finding relevant results using the PageRank algorithm, it’s going to be difficult for any other search engine to match those results. Users may have to work just a little bit harder for unusual queries or specialized searches, but most users will accept that trade-off in return for using their familiar and beloved search engine. Indeed, the word Google has come to represent web search in the same way that the word Xerox had once come to symbolize the process of photocopying.
So what can Powerset (and Cognition) do to gain traction and capture users?
In their recent book, “The Innovator’s Solution,” Clayton Christensen and Michael Raynor discuss how upstart companies challenging market leaders and entrenched incumbents can position new technologies for a reasonable chance of success. One approach that they believe is guaranteed to fail is when these smaller upstarts try to make evolutionary improvements to get and stay ahead of the major players.
Instead, they suggest shaping the new technology into a disruptive innovation, along either of the following two major axes:
1. New-market strategy: Leveraging the innovation to attract users who do not typically participate in using the product or service, and thus growing the market as a whole.
2. Low-end strategy: If there are price-sensitive, over-served users who would be willing to trade some of the advanced functionality in return for a lower price point, then the smaller players have an opportunity to enter the market — that is, if they can figure out a way to make a profit.
In other words, the new players entering the market have to find profitable business opportunities in segments of the market that are not attractive to market leaders.
Using this model, it is apparent that a strategy of challenging Google head-on for control of the mainstream web search market has little hope of success, regardless of the new technologies or search innovations that are applied. Google would have no choice but to fight back with everything it’s got to catch up to or leapfrog this “better search” alternative.
Similarly, since Google search is free for users, there is really no viable low-end strategy, no way to outdo the existing search leader by offering a lower price point.
What about non-participant users? Practically everyone online already uses a web search engine (with Google being the overwhelming favorite). However, Google search follows a specific, consistent set of guidelines: simplicity of UI, speed of response, and relevance based on incoming links. These design parameters take top priority over all other considerations.
By challenging these assumptions, we can discover new use cases in search that are underserved (or not served at all) by Google. Some examples include:
1. UI Simplicity: Google’s minimal UI is trivially simple to use and ideal for a one-size-fits-all model, but it may be less than optimal for complex semantic searches. As Alex Iskold points out in his recent article on the myth and reality of semantic search, a richer user interface would allow power users to express semantically-rich search queries and get back better results. Notably, Powerset and Cognition excel at these types of queries.
2. Speed: For some types of advanced searches, users might be willing to wait, perhaps even as long as a day, in order to get back semantically complex results. Imagine a software agent that acts as a virtual search assistant – once the user specifies a query with multiple levels of complexity and dependency, the agent goes off and returns the next day with a list of possible results/options. Queries that require the coordination of complex tasks fall into this category, such as planning a trip that requires coordinating air travel, hotel and car, and minimizing the cost of the whole trip while taking some additional factors into consideration.
3. Relevance: Although all the mainstream search engines use similar criteria to evaluate relevance (mainly, the evidence of incoming links), other relevance algorithms are certainly feasible and may work better for certain classes of queries. Social relevance is an obvious example; reputable premium content is another.
This post is in no way meant to discredit Powerset — they’re in early beta and are doing a fine job of building semantic search. Instead, the examples above clearly demonstrate that the jury is still out on semantic search; other search engines are also contenders in this space, and the race is far from won.
Nitin Karandikar writes about Web 2.0, Internet search and semantic web on his blog, Software Abstractions