Diffbot helps apps read the web like humans

Diffbot, a Palo Alto, Calif. startup, is trying help developers build apps that read the web like humans. What that means is that Diffbot’s technology uses visual learning robotics and artificial intelligence to view content on the web visually, instead of parsing the underlying HTML code. It sounds a little geeky, but it provides a way for developers and publishers to analyze the web and organize it in ways that are very easy to digest for people, especially mobile users.

The company, which has just opened its first API to the public, is encouraging developers to start using it for a variety of applications. What might these apps look like? Well, AOL(s aol) joined a private beta earlier this year and is now using Diffbot to help personalize its Editions news reader iPad app, (s aapl) by grabbing content and pulling out the top news stories from other sources. Diffbot is able to look at a content source and determine what kind of page it is, what the elements are such as headlines, images, advertisements and contextually understand the content on the page. It can determine what the top story is on a news site.

That’s helpful for news aggregators in determining what to present and creating a clean experience. Co-founder Michael Tung told me he’s talking to tablet news aggregator apps that compete with Editions and I’m guessing some will look at implementing Diffbot’s API. These companies are often doing this kind of work themselves but will now have an option in Diffbot.

But it’s not just news apps that can use Diffbot. Nuance uses the Diffbot API to build large domain-specific text corpuses to train its natural language processing system to recognize speech more accurately in specific areas like medicine. Hacker News Radio uses Diffbot to take the top Hacker News stories and turn them into text-to-speech content for an online radio station. You can see some other apps that use Diffbot’s API here.

Tung said Diffbot can really be useful in mobile applications, which have limited screen real estate and can use extra intelligence to help present content.

“We have hundreds of beta developers and a lot of them are working on mobile apps,” he said. “Web pages don’t look very good on mobile devices, they’re designed for large screens. A lot of the apps are using us because that want to incorporate web data and display it in a better way on mobile devices or do something custom.”

Diffbot is releasing two APIs: There’s an On-Demand API that looks at two types of pages, front pages and articles. The Frontpage API analyzes home pages and indexes things like headlines, bylines, images, articles and ads, while the Article API can extract article text, pictures, and tags from news pages. A second main API called Follow can be used to follow any changes or updates made to a web page and pulls out the useful data. Developers will get 50,000 API calls per month for free on a self-serve basis, while a cloud plan allocates 100,000 free calls a month, then is $0.002 per call after that.

Tung said there are about 30 different page types on the web and Diffbot will be opening APIs for those over time, including profile pages, event pages and product pages. As those become available, expect more developers to give Diffbot a try. Developers can grab more data from a variety of web sources and put them together into some interesting apps that should be useful for consumers.

Diffbot was built by Tung and Leith Abdulla, two Stanford students who were able to land funding from Stanford’s venture fund, SSE ventures. Tung said the company is profitable and is in the process of expanding beyond its five people.