Apache Hive creators raise $13M for their Hadoop service, Qubole

Qubole, the Hadoop-as-a-service startup from Ashish Thusoo and Joydeep Sen Sarma, has raised a $13 million series B round of venture capital led by Norwest Ventures. Thusoo and Sen Sarma created the Apache Hive data warehouse framework for Hadoop while at Facebook several years ago, and launched Qubole in mid-2012. The company has now raised $20 million from investors.

Qubole is hosted on the Amazon Web Services cloud, but can also run on Google Compute Engine, and acts like one might expect a cloud-native Hadoop service to act. It has a graphical user interface, connectors to several common data sources (including cloud object stores), and it takes advantage of cloud capabilities such as autoscaling and spot pricing for compute. The company claims it processes 83 petabytes of data per month and that its customers used 4.96 million cloud compute hours in November.

What’s interesting about Qubole is that although it originally boasted optimized versions of Hive and other MapReduce-based tools, the company also lets users analyze data using the Facebook-created Presto SQL-on-Hadoop engine, and is working on a service around the increasingly popular and very fast Apache Spark framework.

Structure Data 2013 Ashish Thusoo Quobole

Ashish Thusoo at Structure Data 2013.

Qubole’s announcement follows that of a $30 million round for Altiscale on Wednesday and a $3 million round for a newer company called Xplenty in October.

In an interview about Altiscale’s funding, its founder and CEO, Raymie Stata, said his company most often runs up against Qubole and Treasure Data, and occasionally Xplenty, in customer deals. They’re all a little different in terms of capabilities, user experience and probably even target user, but they’re all much more fully featured and user-centric than Amazon Elastic MapReduce, which is the default Hadoop cloud service.

That space could be setting itself up for consolidation as investors keep putting money into it and bigger Hadoop vendors keep trying to bolster their cloud computing stories. Cloudera, Hortonworks, MapR, IBM, Pivotal, Oracle and the list goes on — they all see a future where more workloads will move to the cloud, but they’re all rooted in the software world. At some point they’re going to have to build up their cloud technologies and knowledge, or buy them.