THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 14, 2006

Win Probability in Football

By Tangotiger, 01:32 PM

Pro Trade looks at win probability in football.  In baseball, the minimum states you need is inning, score, base, out.  In football, it’d be time to half, time to game, score, down, yards to 1st down, yards to goal.  The Pro Trade gang also added timeouts, which is neat.  As you can see, alot more states to consider.  However, there are far fewer events to consider in football.  What I’d like to see from them is a Leverage Index chart.  Based on their chart in the article, it seems that there is more clutch situations, but who knows.


#1    Auntbea      (see all posts) 2006/09/15 (Fri) @ 10:05

Hey Tango.

Is there a chart on Protrade somewhere with even a simplistic version of their win probability?  Or is it all behind the scenes?


#2          (see all posts) 2006/09/15 (Fri) @ 10:28

Based on my own experiments with Win Probability in Football (described in Pro Football Prospectus 2005), because there are so many more states than in baseball it’s much more difficult to get a reasonable enough sample size at any particular state to estimate the probability with great accuracy, particularly with only 256 games per NFL season.

This makes it absolutely critical to 1) use as many seasons as possible in your baselines (there’s no good football equivalent of Retrosheet); and 2) use statistical techniques to smooth out the raw data.  Otherwise, your results will end up completely goofy.  And even then, when taken down to the play-by-play level, the process is extremely computationally intensive, perhaps even infeasible without some reasonable assumptions.  Unfortunately, Protrade doesn’t describe their methodology at all.

It’s great to see other football analysts starting to use win probability, perhaps inspired by all the baseball WP analysis the past couple years.  Besides Protrade, you can also find WP-based football analysis at footballcommentary.com, sportsquant.com, and pigskinrevolution.com.


#3    tangotiger      (see all posts) 2006/09/15 (Fri) @ 10:30

Likely behind the scenes.  Might want to check in with http://www.FootballOustiders.com if they have one. 

I have offered to create one, for free, but no one seems interested.  (This is back when I had the time to do it.) It’s the old “you get what you pay for”.  I’ll have to charge 10,000$ so people know I’m serious!


#4    tangotiger      (see all posts) 2006/09/15 (Fri) @ 10:33

Jim, your #2 is spot on.  And, that’s how I did it with baseball.  I start with the 24 base/out states, and from there, expand it to inning, score, base, out.  But, it’s all based on that one chart.

In football, you definitely have to do more “smoothing”, but, I still don’t see it as a big challenge.


#5    Jim A      (see all posts) 2006/09/15 (Fri) @ 11:32

Tango, that’s essentially how I did smoothing as well.  Smoothing points is particularly challenging in football because they are scored in non-continuous increments.  I would guess this would come into play more once you start talking about leverage and strategy.

For example, if a team scores a TD on the opening kickoff, in order to determine the benefit of going for two on the PAT, you have to estimate the WP of being ahead by 8 in the first quarter, a state for which you’ll have almost no historical data.  Even 9 is pretty rare, so you’ll probably have to smooth between 7 and 10 and also decide whether it’s linear.

I’ve also considered building a Markov chain model (as Mark Pankin and others have done in baseball), though I’m not yet convinced it will be worth the effort.


#6    tangotiger      (see all posts) 2006/09/15 (Fri) @ 11:45

Right, the Markov chain would be the best way.  And, of course, it’s not worth the effort!  Everything I do is not worth the effort.  And yet, I do it for some strange sadistic reason.

Great links you provided:
http://www.footballcommentary.com/dynamicprogramming.htm

With the exception of the hurry-up offense, the Model assumes the transition probabilities are independent of the score and time.

That’s one way to reduce the number of states!  As well, they mention they split the field into yards of 10.  I don’t see any of that as necessary.

The important considerations are:
- short pass, long pass, run, FGA, punt

You need to “smooth” out by figuring out the chance of a short pass play being called.  That should be easy enough to do.  So, through all the yards, score, time, downs combinations, when is a short pass called from 0% to 100%?  Use the empirical to come up with smoothed data.  And the same for the other three types of plays.

(You can of course expand this to as many kinds of plays as you want.  And, the play should be based on the *intended* play, not what actually happened.  This of course would be harder to figure, since I doubt the PBP would have that.)

Then, for each one, you have to figure out how successful they are, and how many yards they pick up.

One you have the frequency of each event for each state, and once you have the transition of state-to-state, that’s all you need.


#7    Jim A      (see all posts) 2006/09/15 (Fri) @ 14:12

If you consider 99 yardlines x 60 minutes x 30 score differentials x 4 downs x 20 yards needed for a first down, you’ve already got more than 14 million possible states.  I don’t really like assuming score/time independence since so much of football strategy is based on score and time.  I took a drive-based approach of ignoring down and distance, and using 5-yard and 5-minute intervals to get down to about 7,000 possible states, which I believe is roughly equivalent to what is done in baseball.  This is also similar to what Bob Carroll, Pete Palmer, and John Thorn did in The Hidden Game of Football.  Then you’d have to make additional adjustments based on down, distance, and play type if you want it to be fine-grained enough at the play level.


#8    tangotiger      (see all posts) 2006/09/15 (Fri) @ 14:38

I don’t share that concern.  What my concern is the number of parameters, not the number of states.  Each parameter adds a dimension, so you’d have a 3-d or 4-d or 5-d, etc dimensional array.  That’s what you really want to limit.  Within each one, you can have thousands of possible values, and it doesn’t really matter.  That there’s 24 base-out states in baseball doesn’t make it easier to handle than if there were 240 base-out states.  It’s just a state-to-state transition matrix that you have to fill in, which you can generate empirically, or with a function.

Preferably, you want your event frequency to be independent of the inning/score states, which you can presume (mostly) safely in baseball, but you probably can’t in football (time/score). 

Even at the base/out level, most batter events are independent of those states, but in football (down/yards to go), that’s certainly not the case.


#9    tangotiger      (see all posts) 2006/09/15 (Fri) @ 14:40

Btw, that should be 3600 seconds, and the score differential more like 43 (+/- 21 points), giving us over 1 billion states!


#10    Mark      (see all posts) 2006/12/08 (Fri) @ 17:10

Hey Guys,

Sorry I am a late addition to the thread - it was just forwarded on to me.

I head up our Research team here at Protrade and spear-headed the NFL win probability work. As you guys mention there are far too many distinct states (in the Billions) in the NFL to treat everything emperically.

In developing our model, we developed a lot of statsitical equations to help us bridge those gaps between possible states and the data set we had to work with. The main techniques employed were various regression methods to help us determine the various effects of various variables and states (eg the value of being up 8 versus 7). In that work we also looked at many, many second and thrid order effects as well. Finally, on top of the regression work, we have some Markovian aspects as well.

We are quite happy with our model, but are still continually improving it. If you have any questions, shoot me a line.

-Mark Kamal


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade