How Google Uses Data to Make a Better Google

Alfred Spector, Google, at Structure Big Data 2011Making sense of vast amounts of data is made easier through processor improvements, faster networks and a growing amount of cloud storage capacity, but there’s another factor that’s accelerating the ability to sift through information: user communities. At the Structure Big Data event on Wednesday, Alfred Spector, a VP of Research and Special Initiatives at Google (s goog), illustrated how to combine low-level user data with the massive information stores and cloud computing services offered by his company.

Perhaps the most prominent example is Google’s geographic data used both in both the Google Maps and Earth products. The company harvests global information to create useful products in their own right, but each can be supplemented through localized user data. A modern data management web app makes it easy for Google to host, manage, allow collaboration and publication of data tables or personalized maps. For example, Google Maps data combined with information from hospitals and doctors can easily show which nearby health-care providers have flu vaccines available.

Making large amounts of data usable and modifiable by end users has the potential to create solutions that Google hasn’t envisioned yet. But what it has done is allowed for what Spector calls a “hybrid intelligence” because users and computers are doing more together than either could do individually. Scientists that track global warming may only have access to limited datasets which show only a small picture of the overall situation. Google Earth, however, can augment its base data with sensor information from various satellites and datapoints, providing a more holistic view of global warming.

This user community and data combination approach is leading to smarter machines as well. The voice search features offered by Google are becoming more accurate due to speech recognition data provided by users. In effect, the speech service is training itself because it’s learning from all of the incoming data.

Just as they can with Google Maps data, end users can leverage these smarter machines as well. Spector said that a spam-killing blog moderator could be created by end users if they train the system with both good blog posts and spam comments. Those inputs, combined with Google’s prediction APIs and Python scripts, would effectively create an intelligent automated moderator that could continuously improve its own performance.

Watch live streaming video from gigaombigdata at