Big data needs a product like Microsoft Access

The trend toward self-service in business analytics has been good for the big data industry. But in order for the user-oriented paradigm to take deep root, the industry needs to change the way it is approaching it. The information worker and data analytics worlds need a big data product akin to Microsoft Access.

Access itself was never especially safe or appropriate for production database applications. But business users were able to use it to stretch their imaginations. By having a tool that could build working database application prototypes, users were able to take ownership of what databases could do and how they could be used. Access allowed business users to experiment with databases and implement something in a relatively short period of time. Access provided everything necessary to create a database system that could almost work in a mission-critical capacity.

While lauding a tool that facilitates incomplete success may seem absurd, such a tool is essential to getting successful systems built. Access allowed users to actualize the systems that they wanted, and those systems that passed subsequent peer review created user demand and demonstrated efficacy. This situation was no cul-de-sac, as many such systems were eventually re-implemented by specialists using more professional tools. Without Access, arguably, those systems would not have been implemented at all.

A user tool helps production tools

Access heralded the beginning of self-service data management and, perhaps ironically, it gave rise to widespread adoption of client/server databases in the enterprise. In order for Hadoop and other big data analytics technologies to see the same sort of adoption, we need a tool like Access that can serve as a catalyst, allowing business users to model concretely the kinds of big data systems that they need.

Such a product, call it “Big Access”, would connect to cloud data sources, spreadsheets, enterprise data sources, log files, and perhaps certain machine data beyond those log files. Big Access would also provide functionality for data quality, data blending, and data shaping. It would provide basic data visualization capabilities, though it would leave the fancy stuff to tools that already cover the visualization space.

Big Access would also provide predictive analytics functionality. The amount of explicit effort required to build a predictive model on existing data in Big Access would actually be quite small. Big Access would build such models transparently, in the background, such that it could offer the ability for the business user to run predictive queries on whim.

Beyond bits and pieces

We have tools that fulfill some of these capabilities already. But current products are task-driven; they have a specific purpose and are used explicitly for that purpose. Conversely, Big Access would provide functionality that business users don’t necessarily realize they need. Big Access would determine from context which analytics capabilities were required and would be most useful. It would then make those capabilities available to the user, without burdening the user with a manifest of what the necessary underlying technologies were.

Big Access could run on top of Hadoop. Big Access could run on top of Apache Spark. It could also run on top of Spark Streaming and Spark’s MLLib and even on top of Spark SQL or Hive or Pig. You get the idea. Big Access wouldn’t provide innovative big data technology. It would provide innovation in the usability of existing big data technology.

Developers need it too

Big Access would be programmable. In Java. In Python. In C#. In JavaScript. No programming would be required but custom code would be accommodated. Big Access would be query-able using SQL and could be integrated into mainstream programming environments as if it were a relational database. In fact, a Big Access database, developed by a business user, and deployed to a company server, could immediately be integrated into a line-of-business application by any enterprise developer.

Of course, the same developers could integrate their applications with Hadoop today almost as easily, but many developers don’t realize this. A simple desktop tool that deployed the database to a company server would in fact be more approachable to many developers than would a Hadoop cluster. After the Big Access database was migrated to full-fledged Hadoop, the application could be migrated to Hadoop as well. In this way, Big Access would provide an on-ramp to big data technology for business users and enterprise developers alike.

Enable individuals, win the enterprise

When users can work with products in relative privacy, a greater intimacy between those users and the products can emerge. For example, this is why so much data work gets done in Excel even when, technologically, it is not always the best tool for the job. This is also why people use search engines. And, in fact, this is why so many users have worked with Microsoft Access itself.

Big Access would provide a bridge to users. Some, including entrepreneurs and technologists, may view that as mere fit and finish. But the absence of a tool like Big Access is holding back broader success for big data technology and, ultimately, for those same entrepreneurs and technologists.

If we want data and analytics to be as essential to information workers as documents, spreadsheets, presentations, email, and search are today, then we need big data tools to be as ubiquitous, approachable, and commonplace as search engines and office suite applications. We are not there yet. We need to be there. Perhaps 2015 will be the year.