[Advanced Stats] EPA: Are All Yards Equal?

What is an advanced stat? We explain everything to you in a series of articles that will be published throughout the season.

The Expected Points Added (EPA) come first under the editorial microscope.

Table of Contents

Where is that from ?

The concept of Expected Points (EP) was first introduced in 1970 by then-Bengals quarterback Virgil Carter and Northwestern professor Robert Machol. Their article quantified a concept intuitively understood by all football fans: owning the ball is good, owning it close to the opposing in-goal is better.

The notion was then improved and clarified in the founding book of the statistical approach applied to American football: “The Hidden Game of Football”.
This work released in 1988 and co-written by Bob Carroll, John Thorn and Pete Palmer allowed a first democratization of this notion before it was taken up more widely over the past 5 years by several American journalists.

What’s the point ?

Expected points are based on the idea that not all yards are equal. A 5-yard run in 3rd & 2 and a 5-yard run in 3rd & 10 may count for the same total yards at the end of the game, but they don’t have the same value. The first allows the team to continue its drive and therefore increases its probability of scoring. The purpose of the EPs is to give context to what is happening on each action in order to understand the added value of each game.

“Expected points” are used today by many analysts and teams to better understand what is happening on the ground.

How it works ?

In Expected Points Added there are Expected Points (EP). To define the notion of EPA, it is therefore necessary first to explain that of EP.

The EPs are used to contextualize a game. For each situation (example: 2nd & 10 on 25 yards), the EPs try to determine, on average, how many points a team is supposed to score on this possession.

These probabilities are calculated using the hundreds of thousands of plays that have previously taken place in the NFL. Basically, if a team is in 2nd&5 on its 35 yards, we look at all the similar situations that have taken place in the last 10 years, and we look at how many times possession has ended in a touchdown, a field goal, a punt… This is a very simplified way of looking at things, but it sums up the principle of this statistic well.

The greater the value of the EPs, the greater the probability of scoring points for the team with the ball. On the other hand, the more the value is negative, the more the opposing team has a chance of scoring the next points.

To give more context, the EPs have been simulated according to several models in the graph below depending on the position of the attack on the field:

Here 3 models are detailed (nflscrapR, Hidden Game of Football, Carter) but the most used today is that of nflscrapR in public access. In this model, for a 1st down on the opponent’s 5 yards, an EP value of 6 is given, because this possession has a very high probability of ending in a touchdown. Conversely a 3rd attempt on his 25 yards gives a slightly negative value because there is more chance that the opposing team will score the next points.

Most EP calculation models are actually based on several contextual factors other than just the position of the court and the down as is the case above.

We can cite for example:

The number of yards to go to get a first down
The point differential
The number of minutes left before half-time

These elements are used to refine the context of each action. For example, the integration of the point differential at the time of the action makes it possible to less value the actions taking place when the match is lost or won. Indeed, it is not uncommon for teams that are largely in the lead to relax and allow the opposing team to score a few points that ultimately have no value. The further a team is on the scoreboard, the less what it will do will have an added value on the outcome of the match.

Now that EPs have been defined, calculating EPAs is actually very simple. If a team is in situation of 1st & 10 on its 25 yards. As defined above, this situation has its own EP value: EP (1).

On this first down the quarterback will complete a 5-yard pass bringing his team to 2nd & 5 on his 30 yards. This situation also has an assigned EP value: EP (2). The EPA value of this game is simply the difference in EP between the two situations:

EPA = EP(2) – EP(1)

Thus the EPAs make it possible to define by how much the average of points expected on this drive has evolved thanks to the game which has just taken place.

The greater the EPA value of a game, the more likely that game is to score points. This number obtained therefore makes it possible to quantify the real added value of each action.

Returning to the example at the beginning of the article, for the same gain of 5 yards, the EPAs clearly highlight the difference in added value between the two races:

How to use them?

Generally EPAs are always scaled back to the game count after adding a squad’s EPAs together to clear out volume effects. They are always used from an offense perspective, so a good defense will have an EPA total below zero.

Here are some examples of using EPAs, the data has not been adjusted for epoch and calendar.

The limits of statistics

Like any statistic, Expected Points have limitations, especially when the data is used raw.

EPAs cannot distinguish individual performance. This will work well for quarterbacks because of their importance in an offense’s level of play, but it will be much less reliable for running backs or receivers. But even for quarterbacks, taking only this statistic it is sometimes difficult to differentiate the player from his offensive system.

They also cannot quantify the impact that certain games can have. A particular formation or game may have importance later in the match, signaling for example in trend. Similarly for long stretches, the defenses of Seattle 2013 and Denver 2015 do not appear in the top 10 defenses above despite having had a big impact on the defensive future of the league.

It is also sometimes necessary to adjust the data to the calendar and to compare it with the trends of the time to carry out relevant analyses.

Finally, most of the analyzes based on the EPAs are taken from the public data of nflfastR. This database does not reference formations and patterns which means that raw EPAs cannot take into account the intent of a game. Sometimes a small win is intentional.

Conclusion

The LFS is a statistic very suitable for evaluating collective performance or drawing conclusions on large volumes of data. It presents limits as soon as one seeks to enter into individuality.

This stat can indicate that teams should pass more, especially in certain situations, but it cannot conclude whether a manager’s decision to pass or run was right based on the defense he was facing.

EPAs are a very practical tool to better understand football, especially by manipulating data spanning several matches or years. They are perhaps today the most powerful accessible data to evaluate the real production of a team on the ground, but like any statistic, they take on even more meaning when confronted with other indicators.

October 8, 2022