Monday, July 23, 2007
The fallacy of Pythagorean
Credit SABRMatt with opening my eyes to the impact.
Suppose you have a game like yesterday:
The Yanks went nuts and scored over 20 runs. Suppose that game was followed with a shutout. On average, they scored over 10 runs a game. On average, they won 1 and lost 1. Doesn’t make sense, right?
Here, let’s make it more technical and perfect:
http://www.tangotiger.net/markov.html
Set the AB to “24”, and we get this line:
AVG / OBP / SLG
0.417 / 0.500 / 0.625
Telling us that they will score 14 runs, over a 9 inning game.
Now, set AB to a large number. You will obviously get this line:
AVG / OBP / SLG
0.000 / 0.000 / 0.000
And you can guess the number of runs in a game.
The first game, the .500 OBP game, means you were on base 27 times and made 27 batting outs. The second game, the perfecto, means you were on base 0 times and made 27 batting outs. After two games, you got on base 27 times and made 54 batting outs, for a 0.333 OBP. (A .333 OBP implies 4.7 runs per game.)
After two games however, you scored 14 runs total, or an average of 7 runs per 9 innings.
You see the disconnect here? Now, given a large enough games, all these wild and crazy games will balance out. Now, by large, I mean LARGE, not 81 or 162. I’m talking about several seasons worth.
For this reason, it makes no sense to use the average runs per game to establish the Pythagorean record. You should convert the runs figure down to something bases-like, or convert it up to something wins-like. A game where you score 14 runs total will give you a winning record of around .900, and a game where you are perfected-out will give you a winning record of .000. The average of the two is .450. Not quite the .500 we are looking for, but far better than around .700 a winning record that would be implied by taking the average of 14 and 0 runs, and then converting to wins.
So, the best solution is to convert to something OBP-like, the next best solution, very very close behind would be to convert to something wins-like. The third best solution, far behind, would be to stick to the cumulative runs scored and allowed figures.
Thanks Matt.