Can spawn the data scientists of the future?

We live in a big data world, full of complex algorithms among any type of information one can imagine. Gaining the skills to work with it requires a lot work, however — and the first step in changing that might be realizing that data can be fun.

The data visualization geek behind Facebook’s Timeline

Facebook’s new Timeline feature tries to make sense of the things you’ve been sharing to tell the story of your life. To do so, the company turned to someone who became famous for making infographics about the music he likes and the booze he drinks.

How technology is changing business [Infographic]

Society as we know it is going through a radical makeover, thanks to constant connectivity everywhere. It’s changing our infrastructure needs and it is also increasing the velocity of business. Progress Software has crafted an infographic based on Economist Intelligence Unit research that captures this change.

Twitter by the numbers [Infographic]

Did you know that nearly 40% of all tweets come from a mobile device. Nearly 56% of Twitter users are women and 70% of Twitter accounts are based outside of the US. These and more Twitter facts can be found this infographic.

How much do you love your phone?

A new survey by GPS mobile apps developer TeleNav tries to gauge the American mobile obsession, especially among iPhone users. Findings of that survey are fun and somewhat surprising and have been summed up in this nifty infographic that is good for a giggle.

Big data needs to think outside the tech box

Web companies like Google and Facebook gain business advantage by analyzing large volumes of rapidly changing data about their users, but they are far from alone. A recent infographic from Get Satisfaction charts the volume of data stored in 17 key industry sectors, illustrating that most data comes from some surprising areas, such as manufacturing. But are well-known big data companies like Cloudera and its competitors in a position to address these largely untapped pools of content, bringing big-data-powered analytics to companies outside the technology sector?

In May, the McKinsey Global Institute published Big Data: the Next Frontier for Innovation, Competition and Productivity, a report from which Get Satisfaction drew its statistics. The McKinsey team notes that “data have swept into every industry and business function and are now an important factor of production, alongside labor and capital.” Data volumes are significant in the market segments that we might anticipate, including health care (434 petabytes in 2009), banking (619 petabytes) and ICT/communications (715 petabytes). But these are dwarfed by the 966 petabytes produced inside the manufacturing sector, where the infographic makes a point of highlighting McKinsey’s assertion that “big data has the potential to cut operating costs by nearly 50% across all sectors of manufacturing.”

Knowledge-based industries typically have the advantage when working with big data, combining educated staff and dependence on IT systems with processes and workflows that are typically more amenable to change than those based on physical production lines. But even in the physical environment that dominates manufacturing, there are opportunities to analyze data at scale.

For example, the analysis of stock levels, currency fluctuations, fuel prices and the weather can lead to a leaner and more efficient global supply chain, and with as many as 4,000 suppliers contributing components to the manufacture of a single automobile by Ford or GE, for instance, the resulting web of dependencies may contain many opportunities for small improvements in the time and manner of shipping various components. McKinsey’s analysis suggests that a further 2 to 3 percent could be added to the profit margin on every product. Two to three percent may not sound like much, but those cents rapidly add up when global businesses sell millions of units. A big data store like Hadoop can bring just as much to nuts, bolts and car stereos as it can to the tweets, Likes and transaction logs that big data tools more typically work with.

McKinsey’s analysis also identifies a significant shortage of employees with the skills to exploit data-based opportunities, suggesting that in the U.S. alone there could be a shortfall of 140,000–190,000 skilled staff. Even in companies where the value of big data analysis is recognized, such as LinkedIn and Google, it may therefore still prove difficult to recruit staff with the necessary skills. Cloudera is already building a business dominated by professional services and consultancy engagements with clients, making it one company well placed to expand and fulfill this need, provided it can acquire sufficient knowledge of the sectors (like manufacturing) into which it might grow.

Alongside a potential lack of skilled staff in sectors such as manufacturing, there may also be a gap between the technology companies with big data solutions to sell and the customers for whom these technologies could deliver value. For example, Cloudera highlights a number of customers for its Enterprise product; not one of these featured customers is from outside the ICT/knowledge-worker space. Over at Splunk, the list is more inclusive, and it even includes manufacturing. However, of eight manufacturing companies highlighted, only two (John Deere and Bridgestone) fall outside the high-tech area. And Lockheed Martin certainly does manufacture goods, but it also displays many characteristics of a knowledge enterprise.

McKinsey’s analysis and Get Satisfaction’s illustration point to massive and largely untapped potential, both in traditional information markets and beyond. But to adequately address new markets such as manufacturing, there will be a clear and continuing need for the industry to do more than simply sell today’s software.

Question of the week

Will the current generation of big data startups be able to deliver services to new markets, such as heavy industry?

Today in Cloud

One factor driving the cost reductions that many see in the public cloud is scale; massive data centers create economies of scale from which both the data center operators and their customers derive benefit. Wikibon published an infographic last week to illustrate this, suggesting that 1,000 servers in a ‘traditional’ data center cost $3,350,000 to run compared to $798,000 to run 1,000 servers inside a 100,000 server ‘budget’ cloud facility. Staff savings are particularly notable in Wikibon’s graphic, but it’s worth remembering that this is a snapshot and that every data center is different. You may see higher savings in your own case; or none at all.