What You Didn’t Know About Cloudera

There are a few widespread misconceptions about Cloudera, the promising, well-funded Burlingame, Calif.-based startup that offers services, training and support for the open-source software framework Hadoop. At least that’s what I found out during a talk earlier today with the company’s CEO, Mike Olson.

New Horizons for Hadoop

Cloudera, now just over a year old after launching in October of 2008, has remained a buzz-worthy startup for a number of reasons. One is that the company employs heavy-hitting folks who helped build Hadoop, such as Doug Cutting. Another is that it offers much-in-demand support for Hadoop, which has moved well beyond its roots as an Apache-driven open-source platform powering hugely scalable search technology at companies such as Yahoo (s YHOO) to many new kinds of complex data query tasks of interest to businesses and organizations of all stripes. Hadoop is emerging as a powerful tool for sifting extensive data sets in newly useful ways. In my talk with Olson, he confirmed that Cloudera still sees a lot of “pent-up demand” from companies that want to leverage Hadoop’s power but need help understanding it and using it.

In addition, Cloudera is notable because it’s leveraging the proven business model that Red Hat has deployed around Linux, building a fee-based support and services infrastructure around free, open-source software. Red Hat emerged as one of the big software winners during the recession with such an approach.

But Olson delivered a surprise when he said that it’s wrong to assume that his company is solely focused on open source software. On the contrary, Cloudera will diversify out of a strategy focused solely on it. “Either this quarter or next we will offer an enterprise software bundle consisting of proprietary enhancements for Hadoop users,” Olson said. “Our proprietary apps will complement the open source core, and, like Facebook and Yahoo, we continue to have core committers to Hadoop.”

Cloudera already offers its own distribution of Hadoop, which is downloadable for free, as well as its own proprietary Cloudera Desktop software consisting of dashboard and management tools for Hadoop users. Cloudera Desktop is also currently free, but Olson made clear that, going forward, his company will focus on both free, open-source software and fee-based proprietary software. The enterprise bundle will be the company’s first foray into fee-based software.

Big Data? Try Medium

Also on the surprise front, Olson doesn’t entirely embrace the idea of “Big Data” which I suggested is currently the driver of Hadoop’s success. “When I hear that term I think that must be a Google (s Goog) thing,” he said. “What about Medium Data? We like to say that Facebook doesn’t run Hadoop because it has a lot of data, but that Facebook has a lot of data because it runs Hadoop. Businesses that use Hadoop find that keeping data is worthwhile because Hadoop helps them process it in new ways.” Olson confirmed that Cloudera is working with plenty of large firms in possession of huge data sets, but is also working with smaller ones.

So who is doing what with Cloudera’s Hadoop distribution? According to Olson, Hadoop usage is extending way beyond just searching data. “We see people interested in it for crunching genomics data, retailers and financial institutions interested in it for processing large sets of transactions, and interest from the health-care and energy industries,” he said. You can find discussion of many use cases for Hadoop in these videos.

Cloudera, like many small companies focused on innovative open source-centric strategies and many small companies focused on the cloud, is often cited as an acquisition target. Olson told me, however, that his company has a shot at remaining a long-term standalone outfit. “We have a reasonable chance of doing it,” he said, while confirming that the company may eventually pursue an IPO. “We aren’t actively talking to anyone about any type of merger.”

Patent Shmatent

I also asked Olson about Google’s recent move to patent the MapReduce algorithm for working with large data sets that underlies Google searches. Hadoop is based on a variant of MapReduce, and there have been suggestions made that everyone using Hadoop or MapReduce is in danger following Google’s patents. As we noted here, Hadoop really isn’t threatened, though. “Google has no track record of using patents offensively,” Olson noted.

It will be interesting to see what happens to Cloudera as cloud computing and Hadoop-driven data crunching march forward. Despite the focus on staying independent that company founders cite — and I’m convinced they are focused on that — I wouldn’t be surprised to see it get picked up by a larger company.

Related content from GigaOM Pro (sub req’d):

Yahoo Still Emceeing a Growing Hadoop Lovefest