10 ways big data changes everything

How big data can curb the world’s energy consumption

By Katie Fehrenbacher
The age-old thesis for energy efficiency is “if you measure it, you can manage it.” Once you identify how much energy a person or a building uses, you can reduce its consumption. But in a world where a massive amount of energy data is suddenly emerging — from sensors, devices and the Web — tapping into energy data will take on a whole new meaning, and big data tools could one day become a fundamental way to help the world curb energy consumption.

Opower’s big data plan

A few startups and early-adopter utilities are already turning to big data tools to deliver key aspects of energy efficiency. Opower, a venture-backed energy software startup with offices in Washington, D.C., and San Francisco, tells me it has been transitioning to using Hadoop, via startup Cloudera, to run heavy analytics on the data it crunches in the cloud.
Opower currently manages about 30 TB of information (and growing), which includes energy data from 50 million utility customers (across 60 utilities) as well as public and private data about weather and demographics, historical utility data, geographical data and much more. The data is stored and processed in a combination of over 20 MySQL databases and a production Hadoop cluster.
Most of Opower’s data is structured, with the exception of its systems-logs processing infrastructure. The data is processed in batch processes that access both MySQL and Hadoop, and the current production Hadoop cluster is 12 nodes; that is 80 TB of usable space, 72 cores, 0.5 TB of memory and 120 spindles. The Opower analytics team also uses Pentaho analytics and R in its regular business intelligence work.
The result of all of these new tools is that Opower can help utility customers shave about 2 percent off their home energy consumption by showing customers how well (or poorly) they are doing compared to their peers and neighbors (tapping into shame or guilt) or suggesting other tips like adding energy-saving lightbulbs.
Thanks to big data tools like Hadoop and new analytics, Opower can crunch data faster and deliver better results. Opower’s director of West Coast Engineering, Drew Hylbert, and Alex Newman, Opower’s data architect, told me in an interview that the new Hadoop data architecture enables Opower to create new and better algorithms, and it helps the company compare and aggregate disparate data sets all in one place. Hylbert said new Opower services, like one that forecasts a customer’s monthly bill (using three years of historical data), are relying significantly on the new transitioning data architecture.
Newman, who is helping lead the Hadoop transition, is a data architect wunderkind who previously hailed from Cloudera. Newman said that he joined Opower because it was inspiring to work on issues as important as energy efficiency.
Newman and Hylbert are the first to point out that their current data sets are not exactly “big data” compared to the data sets of huge Internet firms like Google, Facebook and Amazon. But Opower is rapidly growing, adding more utility customers, and it is also adding more data streams. Even running 30 TB of data through its system, Opower has been able to get its utility customers to save 700 million kilowatt-hours to date, which is the equivalent of 1 billion pounds of greenhouse gas emissions and the annual output of 90,000 cars.

Big data for energy sensors

While Opower might be one of the firms leading this new trend, it isn’t the only energy project embracing the cloud and big data. An open-source project called the openPDC is a framework for collecting and storing data from power grid sensor devices several thousand times per second; that data includes voltage, current, frequency and location.
The Tennessee Valley Authority started working on an early version of the openPDC in 2004, and the open-source project officially launched in 2009. The developers of the framework realized they would need big data tools like Hadoop to manage and analyze such a large set of data. The openPDC embraces both the Hadoop Distributed File System and MapReduce, and the organizers of the program opted for HDFS because it could run on commodity hardware, which means a lower cost of deployment.

Why big data about energy is important

The power grid is just beginning to add information technology that will enable computing, sensors, smart meters and software to collect energy data about consumption, available clean power and energy efficiency.
Smart meters — which can read your energy consumption every 15 minutes — are just being installed in major cities. Digital two-way thermostats are appearing on the shelves of big-box retailers like Best Buy. As these devices spread they will generate data that utilities will be able to use to better manage the load on the grid.
Decades down the road, when the power grid has gone truly digital, there will be an overwhelming explosion of energy data, and it will be the smart algorithms and software that will be able to crunch this wealth of data, helping to manage energy efficiently. Those managing such a large amount of data will inevitably need to utilize the next generation in big data tools. Big data, say hello to big energy.