Cloudera acquires self-service data-modeling startup

Hadoop vendor Cloudera is moving closer to the business intelligence space by acquiring a startup called The company’s software analyzes users’ offline queries in order to determine which ones are most important to the business and how they might be improved upon.

Here’s how the website, well, explains the way its software works:

  • Step 1. only needs your queries to profile and understand the business logic. Intuitive dashboards act as mission control for the workload, showing what queries are critical to your business, what these queries do and which data is accessed most often.
  • Step 2. Generate schemas from your actual data usage to map query access patterns. monitors these patterns on a rolling basis, identifying optimizations in the data model and making modernization recommendations.
  • Step 3. Turn recommendations into new data models without having to hand code. batch transforms existing objects at the click of a button, then monitors their ongoing performance.

A screenshot of

For Cloudera, the acquisition represents a chance to help its customers make better use of data by optimizing the queries for better performance and perhaps helping customers find the right data store for the job. That could be in Cloudera Impala or a NoSQL database or even a standard relational database. Anupam Singh,’s co-founder and CEO described part of the company’s rationale and process in a blog post announcing the acquisition:’s first customer executes nearly 8.4 million (yes, million!) SQL queries annually against various data stores. This begs the question: How many of these queries have access patterns that could benefit from a new data model? The customer did not have a clear answer, and we saw an opportunity. Today,’s profiler is used to identify the most common data access patterns and’s transformation engine is used to generate the schema design for modern data stores such as Impala.

If Cloudera is serious about becoming an enterprise data platform company that lives up to the “enterprise data hub” software it’s selling, these are the types of acquisitions it needs to make and nurture. The name of the game isn’t just getting more data into Hadoop and focusing all development around Hadoop, but working to improve the whole ecosystem of technologies around and above Hadoop, as well.

In the broader Hadoop market, the acquisition is just another move in a game of strategy that has been going on for several years and likely will go for several more to come. Last week, for example, Cloudera rival Hortonworks — which had a successful initial public offering in December — announced a Hadoop data governance initiative along with customers Target, Merck and Aetna, and invited Cloudera to join (hint: it probably won’t). Last year around this time, Cloudera secured a massive investment from Intel and several others that resulted in more than half a billion dollars in the company’s war chest.

Tom Reilly (left) at Structure Data 2014. (c) Jakub Moser /

Cloudera CEO Tom Reilly (left) at Structure Data 2014.

We’ll hear a lot more about all of this at our Structure Data conference in March, which features talks from Cloudera CEO Tom Reilly, Hortonworks CEO Rob Bearden and MapR CEO John Schroeder. is Cloudera’s fourth acquisition, with its most recent being an “acquihire” (to use a Silicon Valley term of art) of data science specialists Datapad in September. was founded in October 2013 and had raised money from Mayfield Fund.