There’s a saying in the business world that data analysts spend 80 percent of their time preparing data and only 20 percent of their time actually analyzing it. A Redwood City, Calif.-based startup called Paxata wants to turn ratio on its head, and has raised $10 million from Accel Partners — $8 million of that in a series B round announced on Monday — in order to do that.
Paxata was in stealth mode until Monday, but that hasn’t stopped it from racking up some big customers and partners for its offering, which the company provides as a cloud service. Among its five paying users are Dannon, Box, UBS and Pabst Brewing Company. Paxata also has strategic partnerships with Tableau, Qlikview and Cloudera.
What they’re paying for, and partnering with, is a service that learns from business users’ data and gets it into analyzable shape in a hurry. It’s the middle ground between raw data and data that a BI application can understand, Co-founder and CEO Prakash Nanduri explained. Users load data into Paxata from their databases, public data sources or just Excel files, and Paxata’s semantic algorithms get to work understanding what each column represents, and where there are holes in the data or other problems.
In that sense, Paxata’s service, which it calls Adaptive Data Preparation, isn’t entirely different from the data-formatting experience in analytics services such as Datahero or ClearStory, which also try to eliminate the data-prep legwork by understanding what they’re seeing. If there’s a big difference, though, it’s that Paxata puts a premium on merging datasets together into a single dataset that an application like Tableau could analyze without the user having to understand the intricacies of table joins or other database tricks.
Nanduri gave a simple example of a customer like Pabst loading in four different datasets relating to sales and distribution of Pabst Blue Ribbon beer. Paxata’s software might realize that SKU and “product ID” are the same thing, even though they’re labeled differently in different datasets, and suggest the user stitch them together. If the users clicks “yes,” Paxata merges the two datasets as it sees fit and produces a final dataset that can be loaded and analyzed without any further effort.
“The system’s intelligence is guiding you toward how those all fit together,” Co-founder and VP of Products Nenshad Bardoliwalla said.
And it remembers users’ data and preferences, too. So not only will it suggest the same thing next time someone loads new data from the same sources, but it will recognize similar types of data and possibly recommend similar types of actions on it or labels for it. The company is focused on a handful of specific vertical industries, too — high-tech, retail, consumer packaged goods and financial services — and it has broad understanding of the different types of data and data sources those industries use.
“What you need to know is what product was bought by which customer,” Nanduri added. “That’s what you need to know.” The inference is that users don’t need to know about data integration, quality, enrichment and governance, because the Paxata service handles all of that.
The work Paxata does might not seem as sexy as applying machine learning algorithms to big data or finding the proverbial needle in a haystack, but it’s arguably more useful. Anyone who has ever tried to work with data — even just simple tables — understands how time-consuming the preparation process can be, and that doesn’t even take into account trying to use the tables that a Hadoop job might return or trying to join several tables together. There are lots more business users than there are data scientists, and they need help.
Last October, in fact, Accel funded another company, Trifacta, in this same space. That company, founded by data gurus and professors Jeff Hellerstein and Jeffrey Heer, seems targeted at the same use case, although its product has yet to hit the market.
Feature image courtesy of Shutterstock user adriano77.