Cloudera: All Your Big Data Are Belong to Us

If the next industrial revolution is all about making sense of data, then Cloudera may well prove a primary driver of this Big Data movement as the foundation of the “LAMP stack” for Big Data. It’s still early days in this deep mining of corporate data assets, but as the primary sponsor of the Hadoop open-source project, Cloudera stands to profit. Handsomely.

The company, founded in 2008, has raised two rounds of venture capital funding, but the money coming from customers is more impressive. In just two years, Cloudera has signed over 50 customers, according to Cloudera CEO Mike Olson, and registers over 20,000 downloads each month of its Cloudera Distribution for Hadoop (CDH).

For those unfamiliar with the mechanics of how open-source businesses work, such downloads give Cloudera a fertile hunting ground for prospective customers: so fertile that the company has more than doubled sales each year since its launch in 2008. Cloudera’s customer list includes tech-heavy companies like Rackspace (s rax), Bank of America (s bac), and LinkedIn, but also reflects a mainstreaming of Big Data with the likes of University of Phoenix gracing the list.

Not that Cloudera can take all the credit. Hadoop is an Apache Software Foundation project and has attracted a diverse array of external contributors, including Twitter, Facebook, and Yahoo (s yhoo), among others. Cloudera is an important contributor, but it’s by no means the only one.

Where Cloudera shines, however, is in taking these different contributions and making Hadoop relevant for enterprise IT, where data mining has waxed and waned over the years. Part of the “waning” has come through the cost and complexity of the systems used to mine corporate data. Unfortunately, Hadoop and its ilk haven’t fixed that problem completely just yet, according to open-source veteran Zack Urlocker:

You pretty much gotta be near genius level to build systems on top of Cassandra, Hadoop and the like today. These are powerful tools, but very low-level, equivalent to programming client server applications in assembly language. When it works its [sic] great, but the effort is significant and it’s probably beyond the scope of mainstream IT organizations.

That’s the challenge, and it’s a big one. ¬†Fortunately for Cloudera and its investors, the payback for overcoming that challenge is huge, and Cloudera seems to be well on its way toward achieving it. One possible hitch is that the easier Hadoop becomes to use, the less likely enterprises will be to pay Cloudera for a supported version of Hadoop. It’s therefore critical that Cloudera keeps its Cloudera Enterprise —¬†a suite that includes a tailored distribution of Hadoop plus management and monitoring tools — ahead of the basic Hadoop offering. Companies like Facebook may not need such assistance, but mainstream IT shops likely will.

Cloudera, in other words, is banking on the complexity of Hadoop to drive enterprise IT to its own Cloudera Enteprise tools. It’s a good bet, as a similar strategy has paid off for Red Hat (s rht). So, while a quick scan of the agenda for the upcoming Hadoop World suggests a geeknerati, early-adopter crowd still dominates the discussion around Hadoop, Cloudera is working hard to change this.

Olson tells me he’s been surprised by how rapid the adoption of Hadoop has been within enterprise IT. While he wouldn’t provide details on ongoing customer negotiations, he made it clear that Cloudera is signing an increasing number of customers that one wouldn’t normally classify as early adopters.

Cloudera competitors that also provide support for Hadoop, like Karmasphere, are also making headway, but with the bulk of the core Hadoop contributors like Doug Cutting on Cloudera’s payroll, it’s no surprise that it’s leading the pack.

The consumer web increasingly demonstrates the power of unlocking the data behind social connections, a trend that has contributed to enterprise IT demanding the same depth of Big Data analysis. Cloudera still has a lot of work to do to lower barriers to Big Data adoption among the less technically literate set, but it’s on the right track and, as its growing customer list shows, that track is paying dividends.