Insights from the NFL’s Play-by-Play Dataset: What business leaders can learn from football

Having trouble understanding all the hoopla around big data and the importance of being an information-driven business? A big part of data’s value is having the power to challenge assumptions and preconceived notions to ensure you are making smarter business decisions.

I recently analyzed the NFL’s “Play-by-Play” dataset to test some of my own preconceived notions around football. I’ve been watching football for many years, and like many folks, I have longstanding preconceived notions about plays, team stats, weather and stadium conditions, and the like.

One of my longstanding football preconceptions centered on high altitude games, like those played at Denver’s Mile High Stadium, which sits at an elevation of 5,130 feet (the average elevation of NFL stadiums is 526 feet). Higher elevations mean less oxygen, which can affect player performance.

During this year’s Broncos vs. Ravens season opener, players were shown ingesting pure oxygen to help prevent altitude issues. This got me to thinking: does altitude really affect gameplay, and could I use data to prove it?


Challenging preconceived notions

When I started working with the NFL dataset, my assumption was that games played in one location could have substantially different outcomes if played elsewhere. I checked the average scores of games played in Denver versus other locations and types of plays (pass, run, etc.). I could not find an appreciable difference for games played in Denver, other than a 1 percent increase in pass plays. The data confirmed that my preconceived notions about altitude in football were incorrect.

I also assumed that coaches would always choose to punt the ball on fourth down. Whenever a team decides to go for it on fourth down, the commentators make a big deal about the play, because it challenges the popular consensus. However, the data says that this happens more often than I thought: 15 percent of fourth downs are not kicks.

A bigger stadium means there will be a larger fan base and should translate into higher scores for the home team, right? Again, the data refutes this notion, showing that games played in smaller stadiums actually have higher average scores over large ones: 20.55 to 17.79.

As these examples show, leveraging data analysis ensures you are operating on fact, rather than assumption. You likely have some preconceived notions about your business that are not supported by the data. Going into a game or a business scenario armed with inaccurate information can mean the difference between success and failure. Data is the key to informed decision-making.

Seeing the outcomes

A game is interesting because the outcome of the game is not predetermined. On any given Sunday, either team can win. A football game is broken down into drives, in which the offense attempts to move down the field and score, and it’s the defense’s job to prevent them from scoring.

The End of a Drive

The pie chart above shows how successful a defense or offense play is on average. It’s no surprise that punts are the most common ending to a drive. When a team decides to punt, the defense did its job and prevented the offense from scoring. The data shows that the offense succeeds in scoring a touchdown (EXTRAPOINT) 18 percent or a field goal (FIELDGOAL) 15 percent of the time.

It’s also interesting to see how many times drives result in a “non-standard” ending. These are endings to a drive that the offense does not want to see happen, such as an interception (INTERCEPTION), which happens 7 percent of the time.

Percentage of Scores by Yard Line

The starting yard line heavily influences the outcomes of a drive. The figure above shows the percentage of scores, based on the yard line where the drive starts. In this chart, the 1-yard line is the closest the offense can be to scoring and 100-yard line is the furthest away. As you might expect, the drives with the most yardage to cover will have the most difficulty scoring. Drives that start in the red zone (20-yard line and closer) score 78 percent of the time. Conversely, drives that start on the 80-yard line and further score 21 percent of the time. Those further drives are 2.6x more likely to be intercepted – there are more yards, and therefore more opportunities, for something to go wrong along the way.

Outcomes of football drives are a lot like the outcomes of a sale in business: without analysis, you may be starting your salespeople at the 99-yard line, with a low conversion rate and a high chance of the competition stealing your sale. By analyzing the data, you can place your sales closer to the red zone – and a touchdown.

Augmenting data with more data

The original Play-by-Play dataset had details on specific plays, the yard line, the date and the teams involved. I could answer some interesting questions using this dataset — e.g., what percentage of drives end in a field goal — but I could not answer other questions.

The outcome of a play is influenced by more than just the players moving the ball around the field. There are other factors, like weather and turf type, which were not included in the original NFL dataset, so I augmented it with weather and stadium data, and then ran a series of queries to understand how weather affects gameplay.

Data showed that in inclement weather, the Baltimore Ravens have the highest average score at home, scoring 21.7 to 14.2, while the Kansas City Chiefs have the worst, losing an average of 23.8 to 28. When there is no inclement weather, the Pittsburgh Steelers perform the best, winning 23.8 to 13.6.

Applying this example to business: don’t limit the types and breadth of questions you ask of your dataset. Instead, consider augmenting it with other relevant data sources so you can ask more sophisticated questions. Deciding what new datasets to incorporate requires that you first think about the types of questions you want to ask.

The coach and the CEO

So how can a business executive go about using data to improve their team? First, they need to decide how data-driven their team will be. Decisions cannot be run like algorithms (e.g., given a set of data from which a deterministic decision is made) –data should be leveraged to augment our decision-making by confirming or disproving our preconceived notions.

My preconceived notions about football games were on the money sometimes, but were often wrong. Basing coaching decisions on my incorrect assumptions would have led to devastating losses.

Luckily, we know we have amazing tools at our fingertips to help us more quickly and easily extract insights from data. Technologies like Hadoop are ushering in a new era where data-driven business decisions can be made – no matter how big your questions are, or how daunting the challenge or dataset may seem. Data offers a massive opportunity to challenge or confirm our preconceived notions about business, society and yes, even football. Becoming a data-driven organization will not only help you to separate your business from the pack, but it will give you the power to make better plays and achieve big wins.

Jesse Anderson is a curriculum developer and instructor at Cloudera.
We’ll be discussing many of these topics and concrete examples like this at our Structure Data event March 19 and 20 in New York.