Big data meets the connected car: Researchers tackle the vehicular network

Traffic Jam
Soon our cars will be the most connected devices we own. Consequently they could generate the most expensive monthly data bills of any device we own.
Cars will have built in Wi-Fi allowing them to not only share data, but quite possibly act autonomously on that information. If carriers like Verizon(s vz)(s vod) get their way, every car will have embedded LTE, allowing them to grab any manner and any quantity of content from the airwaves. But all radio connections aren’t created equal.
Wi-Fi is essentially free, while cellular data is expensive. The seeming liberation of an always-connected vehicle could easily be constrained by the shackles of an enormous cellular bill. Is there a way we can maximize the “free” connectivity of  Wi-Fi while minimizing the costs of mobile data?
That’s the question distributed computing researchers from MIT, Georgetown University the National University of Singapore are trying to answer, and it’s a doozey of a problem. A distributed network of cars is by definition ad hoc – the vehicles are constantly moving in relation to one another. They’re forming new W-Fi connections while breaking old ones, changing their positions within the network or leaving the network entirely. Trying to get these mobile and unpredictable nodes to cooperate is going to be difficult.
But MIT graduate student Alex Cornejo said math can be used to used to wrestle just such a network out of freeway chaos. He and his colleagues have developed an algorithm that would allow hundreds of different cars to aggregate their internet-bound data and send it compressed over a single cellular connection, thus reducing bandwidth costs for all the vehicles participating.
The process starts with two cars in Wi-Fi range, both hoping to establish an internet connection to download content, send email, upload documents or some other action. One car passes its data along to the other, which is initially determined randomly, but as the vehicles move throughout the network patterns start emerging. Those patterns determine which vehicles become aggregation nodes for the network, Cornejo said.
“We bias the coin toss,” Cornejo explained. “Cars that have already aggregated a lot will start ‘winning’ more and more, and you get this chain reaction. The more people you meet, the more likely it is that people will feed their data to you.”
When any given car has aggregated enough data, it establishes its cellular connection, uploading aggregated data to the internet or downloading data, which it then distributes back through the same ad hoc network, Cornejo said. The amount of time spent aggregating is determined by the type of data, he added. Files with a longer shelf-life, like e-mail could be passed back and forth between hundreds of vehicles before it exits the network. Real-time applications would have far less tolerance for delay, but he said it would be possible for two vehicles making VoIP calls or video chat sessions to share a single cellular connection.
In theory, a fleet of 1,000 cars could see all of their data aggregated into just five cellular links, even accounting for cars that suddenly break from the network taking their stored up data with them, Cornejo said. The key is for algorithm to define distinct clusters of cars among seemingly random traffic patterns. If the distinctions between those clusters start breaking down, such as one platoon of traffic crosses paths with another, then the whole system breaks down.
That’s the paradox of connectivity, Cornejo said. If you have 1,000 cars in a single big cluster, data can be aggregated. If you have 10 well-defined clusters of 100 cars each, again data can be aggregated. But if you have two clusters of 500 cars in the vicinity of one another – with data occasionally being passed back and forth between each cluster – then aggregation becomes impossible.
Traffic Image courtesy of Flickr user