Thursday, June 03, 2010
No, no, no, no, NO! (Part 2)
I don’t remember what my last no,no,no,no,no thread was about. It must have bothered me then, and this one bothers me now: running regressions of salary to wins.
I just don’t know what to say at this point any more. I’ve got at least a dozen threads on this. Correlation increases as the number of games increases. It’s really that simple. There’s a huge difference between running a regression against 70 games and against 700 games. Every time I see one of these regressions, the implicit treatment of the OBSERVED winning percentage is that it’s a TRUE winning percentage.
Even if God were to tell you the exact talent level of every single player in MLB, you will never be able to get r=0.9999 between talent and winning %. Not unless you’ve got one million games played.
Please, guys, stop it with regression analysis. Apologies to Hawkonomics for using his/her post as my target practice. I otherwise enjoy that blog.


Wow, this is just a disaster of a sentence (a criminal misuse of the concept of “statistical significance"):
“Second, we I run a regression (including a constant term), I find that payroll for the 2010 season is statistically insignificant. In other words in statistical terms payroll has zero effect on winning percentage at this point in the season. From that I would conclude that payroll for the 2010 MLB season has zero impact on winning percent in the 2010 season.”
So we really need a statistical test to tell us whether payroll is related to winning percentage? If it weren’t teams would have to be randomly paying for wins, if you know what I mean. In other words, they would have to be valuing players by throwing darts at a dartboard with wedges of 0 to around 5 WAR.