As 2013 gains steam, the big data and analytics world is radically changing. Just these past few weeks, I’ve spent time with industry thought leaders including Mayank Bawa, Mike Olson and Scott Yara, talking about the emergence of a new analytics stack that displaces the current BI-ETL-EDW paradigm. This new stack fundamentally rethinks the data management, analytic transparency and user consumption elements into a more-cohesive platform that removes the enormous latencies and waste in how analytic software is designed and deployed today.
Hadoop has become the foundational underpinning of managing big data for organizations large and small. The amazing pace of innovation has only been accelerated with recent announcements concerning Greenplum Pivotal HD, Hortonworks Stinger and Cloudera Impala. The trajectory of these projects is crystal clear: the major Hadoop distribution providers are introducing real-time, interactive queries on top of Hadoop HDFS. This brings the best of both worlds together — well-known SQL-based query processing with the exponential scale-out ability of HDFS’s storage architecture.
Predictive analytics are essential for data-driven leaders to craft their next best decision. There are a variety of techniques across the predictive and statistical spectrums that help businesses better understand the not too distant future. Today’s biggest challenge for predictive analytics is that it is delivered in a very black-box fashion. As business leaders rely more on predictive techniques to make great data-driven decisions, there needs to be much more of a clear-box approach.
Analytics need to be packaged with self-description of data lineage, derivation of how calculations were made and an explanation of the underlying math behind any embedded algorithms. This is where I think analytics need to shift in the coming years; quickly moving away from black-box capabilities, while deliberately putting decision makers back in the driver’s seat. That’s not just about analytic output, but how it was designed, its underlying fidelity and its inherent lineage — so that trusting in analytics isn’t an act of faith.
Even after achieving analytics transparency, challenges remain in a number of places: rolling out repeatable applications, creating best-practices, collaborating across organizations, evolving what was built, seamlessly recombining models, and eventually either sharing the strongest content back to the broader community or securely maintaining higher analytic intellectual property. This iterative, responsive approach to user consumption is key to modern analytical success. This is where something like an app store for analytics can really drive user adoption.
This brings me to the new analytic stack. The need for a modern, purpose-built analytic stack is critical. This is a stack that doesn’t worry about the source or shape of the data that is coming into it, but one that is able to ingest structured, unstructured and semi-structured sources seamlessly. One that can create meaningful output, can deliver clear-box predictive analytics and can quickly deploy analytic applications for broader user consumption.
Recently, Gartner released the 2013 BI and Analytics Magic Quadrant, while Wikibon released its 2013 Big Data Market Forecast. Both reports point to a clear signal that even as analytics is taking center stage, yesterday’s BI-ETL-EDW stack is wrong-sided for tomorrow’s needs, and quickly becoming irrelevant.
GigaOM’s Structure: Data conference in New York last week was an even clearer sign that this shift to a new analytic stack is happening. We saw an incredible, yet humbling validation that this future is already here:
- Hadoop (and NoSQL) are significantly disrupting in how we manage data, especially at petabyte scale.
- The rise of R and Stata over black-box analytics in academic circles is a strong leading indicator of where the commercial world is headed.
- Analytic consumption is beginning to move away from just data scientists to analysts and end-users via pre-packaged content and applications.
As this new analytic stack emerges, the big data community will continue be an exciting place for the decade to come.
George Mathew is president and COO of Alteryx. You can follow him on Twitter at @gkm1.
Feature image courtesy of Shutterstock user ramcreations.