Idibon secures $1.4M as it builds a tool to mine the world’s languages

Idibon, an ambitious stealth-mode startup, has closed on $1.4 million in seed funding from Khosla Ventures to keep building out natural-language-processing software. The software helps enterprises get insight into sentiments expressed in text on the internet in any language you can think of — with a small role reserved for human beings.

The San Francisco company doesn’t want to reveal how everything works yet. But previous work from Idibon CEO Rob Munro provides hints of what’s possible. In his 2012 Stanford Ph.D. dissertation, entitled “Processing Short Message Communications in Low-Resource Languages,” Munro explained that it was possible to build natural-language-processing systems that could handle many variations in word spelling in text messages and tweets in Chichewa, Haitian Krèyol and Urdu when classifying, even when the systems had little time to train and get better and no previous familiarity with the languages. In the case of the texts in Haitian Krèyol that were sent following the January 2010 earthquake in Haiti, prioritizing helped quickly sift out the genuine emergencies. The question is whether a tool could be developed to pick up patterns in text in any language. Such a system, if combined with a powerful translation tool, could be deployed for a wide variety of applications, from sentiment analysis to intelligence gathering.

Rather than leave machines to bear the burden of figuring out what people mean when they communicate in obscure languages, Idibon wants humans to play a role, such as verifying that data is correct. That sort of work could be crowdsourced. “Machines are never going to be 100 percent accurate,” Munro said. The idea of bringing together humans and algorithms to solve problems has come up in other applications, and several came up in on-stage conversation at GigaOM’s Structure:Data 2013 conference in New York last month.

How could enterprises use Idibon? Half a dozen customers are using the beta version of the software in different ways. One is relying on Idibon to run a medical question-and-answer system that can spit out an answer or possible answers. And “a sales organization” is using Idibon to rifle through news articles, blogs and other documents to document relationships among people and organizations and point to past acquisitions, Munro said. It’s also possible for Idibon to process information from multiple languages to serve up data for business-intelligence applications.

For now, Idibon is “just a simple API service,” Munro said. Some direct integration of the Idibon data is happening, too. The software takes in unstructured data — from tweets, instant messages, emails and so on — processes it and responds with structured data, he said. Ultimately, though, “we want to become the leading organization for scalable cloud-based natural-language processing,” Munro said.

English comprises a small fraction of all communication — roughly 375 million people call English their first language, out of more than 7 billion people in the world — and that’s why a tool with more universal linguistic powers sounds so appealing. While not many enterprises might be looking to capture data in little-known languages now, it could become essential in the coming years. If Idibon can come out with a product soon, it could be the beneficiary of a sort of international arms race for truly global understanding.