Some big thoughts on big data and cloud for 2012

2011 was the beginning of the big data onslaught, but hold onto your hats: big data will only get bigger in 2012.

I’ve spoken to a bevy of experts in the last few weeks, ranging from venture capitalists to vendor execs. Here are some of their thoughts on how the world of big data and cloud computing infrastructure will shake out next year.

1: Look for a battle of the PaaSes

Nearly every vendor has a platform as a service now, and it’s not clear there’s enough business for all of them. Microsoft (s MSFT) Azure,’s (s CRM) Heroku and Red Hat’s (s RHT) OpenShift all promise multi-language support, but they’re facing an array of spunky upstarts in the forms of AppFog, StandingCloud, EngineYard and DotCloud, among others.

The founder of one Silicon Valley (non-PaaS) startup who did not want to be named summed it up: “There are far too many PaaSes out there. There will definitely be a shakeout.”

2: Legacy players try to co-opt/convert Hadoop momentum

As we all know, Hadoop is great. But this framework for handling distributed data is not a miracle worker and hiring Hadoop expertise to build solutions is very expensive. That’s why companies like General Electric (s GE) and Hewlett-Packard (s HPQ) will pitch their own non-Hadoopy wares as an alternative, less labor-intensive way to solve big data problems.

Last month, Nicole Egan, CMO of Autonomy, now a unit of HP, said Hadoop has its place but can’t be all things to all people. CIOs are faced with an emerging problem set and some try to put the Hadoop and mapreduce pieces together to solve those problems. “But, at heart  [those solutions] don’t understand what’s in the content, they [are] counting the number of times a word appears as a proxy for meaning. And they are limited to text where [Autonomy] does audio and video.”

3: Specialized databases take hold

It’s not just database pioneer Michael Stonebraker, of Ingres, Postgres and Vertica fame, who thinks that specialized databases are the way to go. Now he’s pushing VoltDB as a specialized database for handling very fast transactions, which he says traditional relational databases are too slow to handle.

VoltDB does what it does — fast online transaction processing — well, but makes no pretense of being a data warehouse. “One size does not fit all here. I’m a huge fan of specialization. Greenplum, ParAccel, et cetera, are all good data warehouse systems. They’re great at that but they’re terrible for OLTP,” Stonebraker said at the recent USENIX Lisa 2011 conference.

Michael Skok, partner with VC firm North Bridge Ventures, agreed. “If you look at all this unstructured web data, sensor-based data, data that comes in at different rates and formats and suits different uses, it doesn’t make sense to put them all together,” he said. That thesis is what drove the NoSQL movement and he thinks will spark a big NewSQL rush, as well. “NoSQL is good for unstructured data but there’s an unbelievable amount of structured data and those people don’t want to change their apps.”

In the past, relational databases were able to subsume new workloads. There used to be a flock of object-oriented database companies like Ontologic and Object Design. They disappeared because Oracle, IBM and Microsoft were able to incorporate some object capabilities into their relational databases.

That may not happen this time. For one thing, all that unruly, unstructured data is now at least digitized. “Back in the object database era, those new data types — voice, video — hadn’t really exploded. They weren’t all digitized. Now it’s so cheap to digitize and store them, things will shake out differently,” Skok said.

4: Segmented data centers gain momentum

The push is on by companies to build-out data center capabilities (either in-house, in a co-lo or in a public cloud) with commodity hardware to handle webscale loads. At the same time, they want to preserve existing investments in applications.

So, large companies are segmenting their data centers, putting in homogeneous server farms into one section to house new workloads, but retaining a legacy section to run the application workhorses of the past. You know, the NetWare server that’s run since the late 1990s. It ain’t glamorous, but it works.

“Customers want an environment where they have one throat to choke for new loads. They try to make their data center more future-friendly and doing it this way means they can add capacity easily,” said Peter Panfil, SVP at Emerson Network Power (s EMR).

The older data center segment runs the mish-mosh of older apps so the company can wring the most value of software they’ve already bought. That tale of two data centers will continue for a while, Panfil said.

5: Cloud standards start to emerge

This may be more wishful thinking than prediction. As of now, there is no official standard API for public infrastructure-as-a-service clouds.  That means visions of lock-in for developers who write for a given vendor’s cloud. Folks, like RightScale CEO Michael Crandell, hope against hope that will start to change next year so that development and operations teams can start writing workloads that really can run across public clouds.

“As much as people talk about [Amazon (s AMZN)] EC2 as a standard, there’s really no IaaS API standards, no resource behavior standards and that’s a problem,” Crandell said. “We haven’t held our breath for standards to emerge, but to the extent they do they’ll make our lives easier.”

Photo courtesy of Flickr user Kevin Krejci.