Precog, a Boulder, Colo.-based startup that’s trying to seed the market for advanced analytics on unstructured data, is coming out of beta on Thursday with a line of appliances designed to let everyday users get started on making sense of social, web and application data. The company’s underlying technology has remained the same since we profiled Precog in September, but a journey into the world outside Silicon Valley has changed its thinking about how to market and deliver its product.
Put simply, Precog’s technology lets users ask questions of their unstructured data (e.g., stuff sitting in Hadoop, MongoDB or any other non-relational data store) in whatever format it was created — JSON, logfile, XML, what have you. This is different from the standard operating procedure of querying unstructured data — including the current SQL-on-Hadoop craze — which usually involves somehow transforming data into a format that a relational engine can read before beginning the analysis. Precog also features visualizations, charts and reports designed with these new types of data, and presumably larger datasets, in mind.
However, Founder and CEO John De Goes told me, the company came to realize over the past several months that as much as what it’s doing might fall under the “data science” umbrella, that’s the wrong messaging. Outside of Silicon Valley, he said, “a lot of companies don’t have the technological sophistication to understand the whole data science thing” — they just want to know that they can ask deeper questions of the new data types they’re storing in their NoSQL databases without having to perform ETL operations on it or write a lot of complicated code.
And the bigger those companies are, Precog COO Jeff Carr said, the less likely they are to want a cloud service like Precog initially offered.
So the company took both lessons to heart and is rolling out a line of appliances (physical or virtual) that complement its flagship cloud service, each targeting specific use cases. The first three are social media, web analytics and application data, and the appliances are equipped with baked-in capabilities important to each of those fields. The social media one, for example, will feature advanced sentiment analysis and natural language processing, while the web analytics one will focus on features such as behavioral clustering.
Under the covers, though, each appliance still runs on the broader Precog platform, Carr noted, and someone who buys one just to get started in a specific area can pretty easily (i.e., without reaching “super-coder” status) turn it toward other data types and other types of analysis. But right now, De Goes added, no one really knows what it means to have an analytics product designed for unstructured data, so the appliance approach should make it easier for large enterprises and non-tech companies to digest.
It’s a “baby steps” situation, explained Carr: “Don’t sit there and try to think about how to solve every problem all at once. Let’s try to sit there and think about data types you know you’re having problems with [now].”
Analyzing data in its native format has advantages beyond just omitting an extra transformation step, though, and the Precog team thinks companies will get hip to these advantages as they begin to understand the analytic aspects of non-relational databases as well as they do the operational aspects. Often times, these will be new use cases, which is why Precog considers itself more complementary to than competitive with traditional data warehouses, SQL-on-Hadoop tools and BI software.
One early customer is using Precog to match up résumé data — often enhanced résumé data — with job openings, which is a tricky proposition in a relational format because résumés can include so much personalized information or content that doesn’t fit into a schema at all, really. Another user, a large telco, is trying to build new data products for its customers by mashing together all sorts of internal and third-party data in numerous formats.
Carr compared the shift to the shift from just flat files to relational data decades ago. “It’s happening again,” he said. “It has to happen again … people are not going to abandon JSON because it does’t fit neatly inside a table.”
Precog is telling the right story around why unstructured analytics matters, but one has to assume there will be a major shakeout in the big data analytics space over the next few years. There are only so many new technologies companies can absorb at once — Hadoop, NoSQL, SQL on Hadoop, unstructured analytics, Platfora, in-memory, stream processing, next-gen analytic databases, etc. — and it’s hard to predict which messages and capabilities will win out.
However, unless Hadoop really does become the lone dumping ground for all non-operational data — regardless the source — technologies like Precog that can act as the analytics layer across numerous data stores would seem to have an advantage.