Data, process, and people: three words that sum up analytics

As a data scientist for more than 15 years, I’ve seen firsthand the impact that analytics can have on the bottom line. I am asked all of the time: what is the best analytics strategy? And I always answer: data, process and people.

Here are some ideas to improve your tactics.


There is a lot of information out there about the explosion of data. IBM has frequently mentioned that “90 percent of the data in the world today has been created in the last two years alone.” One of the fastest growing areas in the data explosion is computer-generated data. In the early days, this was mostly log files. But now, sensors are becoming cheaper and easier to use and devices are becoming more intelligent and connected.

Take, for example, the quantified self-movement. Now we see fitness freaks and professional athletes using devices like Fitbit, Jawbone and Nike+ FuelBand to monitor their exercise and sleep cycle. In the future we will see more data collection in these devices, like the intriguing Wello health tracker, which connects to your mobile device and measures heart rate, blood oxygen, blood pressure, temperature and ECG as well as lung function.

Another important class of data is data generated from business processes. Companies have been mining the “data exhaust” from business process for decades. A lot of good work was done in that time but many efforts were blunted by the lack of historical data: Storing more than 90 days’ worth of data was considered to be too expensive. It was difficult to bring data together across different data silos. But the advent of Big Data tools makes it possible to store and analyze more data less expensively. Companies are now storing more and finding value with larger and larger datasets.


The “process” part of an analytics strategy is composed of two parts; the process for finding value in the data — whether this is through traditional business intelligence techniques, visualization, or advanced machine learning — and the process for adding analytics into the business. Here are some suggestions for finding value in the data:

  • Start small and build credibility. One of the primary areas of failure for analytic projects that I’ve seen in my career was the “monolithic” data warehouse. Years were spent defining the data structure, cleaning and populating the data and at the end, the business had changed.
  • Work from a list of important questions, but leave time for discovery. I’m not a fan of completely unconstrained investigation. Without some generic questions the likelihood of getting value is low. Conversely, it’s easy to leave value on the table if an analytics project is too rigorously structured.
  • Tell a good story. Many analytics presentations crash and burn because no one answered the question, “so what?” Almost as bad are the presentations with dense formulas and a single R2 value. Take your audience on a data journey. Use good visualizations. Watch Hans Rosling’s TED talk.
  • Always look for more data. I’m not going to get into the middle of the “is more data better than a better algorithm” argument. There is much to be said for both sides, but when looking to add value with analytics, the more data the better; get a longer time series, break down a data silo, combine your data with open data.
  • Apply analytics to more decisions. Anytime you look at business metrics there is an opportunity to use analytics. A hot area right now is the application of big data and analytics to HR processes.
  • Improve the accuracy of current analytic models. Machine learning models need to be continuously updated. There are many ways to improve the accuracy of analytics models; more/better data, better machine learning algorithms, even better understanding of the problem domain.
  • Speed up the application of analytics. This is an area that I don’t think gets enough attention. If the analytic model has proven accuracy and can be produced in a timely fashion, use it. I’ve heard of businesses spending resources developing a good model then continue to rely on HiPPO (Highest Paid Person’s Opinion) decision making.


Finding data scientists is difficult and is likely to get worse before it gets better.

More and more universities are developing training programs as part of their current curriculum as well as specialty certificate programs. But while it’s easy to repurpose classes in computer science and statistics, teaching someone how to tell a good story with data is tough. Domain knowledge is a controversial subject as there have been some high-profile examples of data scientists generating great results with little knowledge of the subject. In my opinion, however, some knowledge of the problem domain is desirable.

The good news is that you don’t need a team completely comprised of data scientists; you just need a team with a data scientist. My suggestions for building a good team are:

  • Of the four major areas (computer science, statistics, storytelling and domain knowledge) the IT component has the hardest job, having to maintain the infrastructure as well as implement analytic results.
  • Try and get everyone up to speed on the domain knowledge as quickly as possible.
  • Don’t be surprised if some of your IT and business analysts already have a good foundation in statistics. Get them some additional training and move them into junior data scientist positions.
  • Look for strategic hires outside of your company. Being able to look at problems in a different way is a big asset in a data scientist.
  • Almost everyone on the team should be able to pull their own data and do some simple analytics.
  • At least 50 percent of the team should be located at corporate headquarters. Up this to 100 percent when you’re first starting the team.
  • Engage with academics, vendors and outside partners, but be careful. Data has value and analytics are your intellectual property to get that value; don’t give it away.

Always remember those three words: data, process and people and you can’t go wrong.

Mike Cavaretta is a data scientist and manager at Ford Motor Company in Dearborn, Michigan. He’ll be speaking at Structure Data Wednesday March 19th in New York.