Five reasons why data growth will outstrip processing for the foreseeable future

A while back, I documented what I called, with no small amount of hubris, “Jonno’s first Law” – namely that data will always be created at a greater rate than it can be processed. This principle, I believe, is fundamental to why we will fail to see the ultimate vision of artificial intelligence (which I first studied at university 30 years ago) become reality, perhaps for some decades.
So, what is driving the ‘law’? Most simply that Moore’s Law, which states that the number of transistors on a chip will double periodically, is not the only principle at play. Other principles are economic, contextual and consequences of the way we choose to create data. While I haven’t done the maths, here are some of the reasons why Jonno’s first law will continue to apply:

  1. Data creation requires less processing than data interpretation. Data is easy to generate from even the least smart of sensors. It is also easy to duplicate with minimal processing and/or power, as illustrated by passive RFID tags and ‘smart’ paving slabs.

Corollary: A small modification in a complex data set can difficult to represent as the differences from the original data set, meaning it is more likely to result in two complex data sets.

  1. There is always more data to be captured. Current business approaches are based on gaining an advantage based on accumulating more data. Equally, human desire to progress implies higher quality images and frame rates, larger screens, more detailed information from manufacturing systems and so on.

Corollary: The universe cannot be measured molecule by molecule – it is too vast, the data set too big to capture without a similarly vast set of measures. Heisenberg’s uncertainty principle comes into play both at the lowest level and in how captured data influences human behaviour.

  1. The number of data generators is increasing faster than processors. For example, digital cameras and indeed, mobile phones are being used in the main for content generation. 100 million servers exist in the world, compared to 10 billion phones.

Corollary: While processing continues to commoditise, data generation continues to fragment. Cloud computing manifests the former, and consumers, the latter.

  1. Current models charge less (or zero) for data generation, more for processing. Consumer-oriented data generation is paid for by advertising, which covers its costs but leaves little for large-scale processing of the information.

Corollary: Consumer-based data generation is fragmented due to competitive interests, as multiple organisations (Facebook, Amazon) are growing their businesses primarily through data accumulation and secondarily interpretation.

  1. Many algorithms are only recently becoming feasible. Much of the mathematics around current AI, machine learning and so on is well established, but processing was too expensive to handle it until recently. The potential of such algorithms is still being worked through, therefore.

Corollary: The algorithms we use are based on human understanding, not computer understanding. This means that our ability to process information is bottlenecked not only by processing power, but also by our ability to create suitable algorithms – which themselves depend on the outputs of such processing.
We still lack an understanding of how to automate the interpretation of data in an intelligent way, and will do so for the foreseeable future. As notes Mark Zuckerberg,“In a way, AI is both closer and farther off than we imagine… we’re still figuring out what real intelligence is.” A final hypothesis is that such a leap will be required before processing capability can actually ‘leapfrog’ data growth and bypass the realities of data growth versus processing.