Tuesday, March 16, 2010
Simpson’s Paradox
Jack Morris v Bert Blyleven on BRef. When given 0-2 runs of support, Bert has a much lower ERA and much better win% than Jack Morris. When given 3-5 runs of support, Bert has a slightly better ERA, and barely worse win%. Given 6+ runs of support, he’s got a way better ERA, and somewhat better win% (though it’s hard to get much better win% when Morris is “bad” at 139-10).
Jack Morris overall however has a better win%. And that’s because he had more games where he got more run support than Bert. And so, even though at the run support level, Bert has the better win%, when you add it up, Morris as the better win%.
I don’t know if this makes sense to everyone, but perhaps Poz can give us a 2000-word blog post that hammers this point home for everyone else.
Glove-slap: BtB


Jim Albert explains this well in his “Workshop Statistics” book (which, incidentally, is an excellent intro stats book written from a Bayesian perspective, and is available for free download as an e-book on his website).
He says something like: Player A had a better batting average than Player B in the first half and the second half of the season, yet ended up with a worse overall batting average for the total season.
An easy illustration:
First Half: Player A goes 50 for 100 (.500) and Player B goes 100 for 300 (.333)
Second Half: Player A goes 50 for 300 (.167) and Player B goes 10 for 100 (.100)
Total: Player A is 100 for 400 (.250) and Player B is 110 for 400 (.275)