Wednesday, July 12, 2006
Isolating Pitchers From Batters
I received an email from Benjamin Alamar, PhD, Editor, Journal of Quantitative Analysis in Sports, directing me to a research paper posted here:
http://www.bepress.com/jqas/vol2/iss3/4/
It discusses the Run Expectancy matrix. I invite the authors of that paper to discuss or refute my points here.
Disclosure: I have corresponded a few times with two of those authors, and therefore, you may consider my comments here biased towards them. They are not. As evidence, I have written a book with mgl, and we don’t pull punches towards each other. I just call ‘em as I see ‘em.
The paper properly surmises that:
The quantitative problem then becomes splitting the change in NERV in such a way that batters who put a ball in play that is hard to field gets a positive credit, even if a defensive player makes an unlikely play on the ball and the change in NERV for the entire play is negative.
The next statement however is a falsehood:
There is no way, for example, to determine using a NERV analysis whether a great pitcher or a great batter has more total effect on a team over the course of a season. Splitting NERV between batters and pitchers objectively, allows for the direct comparison of total NERV between pitchers and batters to determine which will have a greater impact on a team.
I’ll get back to this after I read the rest of the paper.
I was afraid they’d be going here:
Finally, the expected change NERV change is calculated and split between the batter and pitcher according to our estimated split (batter 62% and pitcher 38%).
This is also wrong, for reasons I will elaborate at the end.
Previous work that has utilized NERV has used tables that calculate the expectancy based upon only the number of outs and the configuration of men on base.
While this is for the most part true, but I don’t like the word “only”. RE and WE matrix are based on whatever states you want to include in them. The 24 state matrix is the easiest to use, but certainly not the “only” one ever used.
ForwardRuns was used as the dependent variable in a series of regressions to determine the best fit.
I don’t like this, but I understand that it’s close enough, and easier to program. Woolner did the same in BP 2005.
Table 1 looks a bit like what John Jarvis may have done. Again, I oppose regression being used here. A basic-Markov, or full-Markov, calculation should be implemented. Again, I understand the attraction to best-fit models.
For example, if I am reading it right, the lineup position’s starting RE is always going progressively lower. This is another falsehood, since the RE with the 9th batter (AL parks at least) is *higher* than the RE with the 8th batter, simply because RE is about runs to end of inning, and the 9th hitter is followed by the 1-2-3 hitters.
The charts in Figure 1 are beautiful. I’ve done similar charts, based on the 2004 BIS data I have. They look really cool when you split it up by batter handedness, as you can see the shift in positioning. You can almost infer positioning based on charts like this, for every player. I haven’t researched it yet, but I would bet Ichiro would be a good case study as a guy who is an exceptional baseball athlete who scores relatively low on fielding metrics. I wouldn’t be surprised that it’s his positioning that makes him end up worse than he should be based on his toolset alone. (Of course, positioning is a skill, but whether that skill is the manager or the fielder is up for debate.)
Section 4 I believe is where the problem lies. As I discussed in The Book, a .300 OBP pitcher facing a .400 OBP hitter results in the same performance expectation as a .400 OBP pitcher facing a .300 OBP hitter. It’s this whole “splitting up” issue that is the problem, something that plagues Win Shares, and I think is the problem in this paper.
Tables 3a, 3b are unclear to me how that could have happened, and as the paper discusses on page 9:
If NERV had not been split according to the 62%/38% estimated in this paper, but rather using the arbitrary 50%/50% split used in standard analysis, the scores of the best pitchers would have been relatively higher and the scores of the best batters would have been relatively lower. The conclusion from that analysis would be that the best pitchers are relatively more important than the best batters. It is therefore important to make the estimated split in order to more accurately compare batters with pitchers.
I am certainly missing something, because I give 100% of the credit to the hitter, and 100% of the credit to the pitcher. (Remember, this is not the same as 50-50, and it looks like I’m double-counting.) And I get results where the top 10 hitters are pretty much ahead of the top 10 pitchers.
Ok, now let me tell you why this splitting-issue is not the right way to proceed. And I’m going to use hockey as an example. In hockey, everyone on the ice for the scoring team gets a “plus 1”, and the players on the ice for the opposing team gets a “minus 1”. This means that every goal has five pluses and five minuses. If we follow the logic in this paper, this would entail taking this one goal, and somehow splitting it up among the players on the ice. Perhaps giving the five guys on the scoring team a total of +.5 goals, and the give guys on the opposing team a total of -.5. And then, among the scoring team, deciding who gets the share of the +.5, like maybe +.25 for the goal scorer, +.10 for the playmaker, and +.05 for the other guys on the ice.
Assume you have a player, which I’ll call Obby Borr. He’s a +120 for the Bruins, and when he’s not on the ice, his teammates are +0. Also assume that Borr plays with everyone on the team. If you proceed with a “splitting” arrangement, Borr will end up being credited for something like +60 goals, if he’s lucky. Likely, under the splitting-system, he’ll be even lower. However, in my system, he gets +120.
You see, Borr plus his teammates is +120. All of his teammates are zero. Therefore, Borr plus zero is +120, making Borr equal to +120.
You have to treat each player as if he’s his own universe, and you adjust for the extra parameters. The same logic applies to strength of schedule, or how to credit the DP between the 2B and SS, and several other concepts.
Getting back to splitting up run expectancy (or win expectancy), you just don’t do it that way. Having the leading hitters at +25, as Table 3a shows, is simply wrong. If you add the performance of those players into any team (and take out an average hitter), that team will certainly score more than 25 more runs than otherwise. Splitting doesn’t work.
I must admit that most of the paper is over my head, as is Tango’s brief analysis herein. (OK, I just admitted to the world that I am not as smart as I make myself out to be sometimes - or at least as readers think I make myself out to be...).
However, I agree with Tango that the 100%/100% model is correct. The relative “responsibilities” of the batter and pitcher in the batter/pitcher matchup is simply based on the difference in the spread of skill among batters and then again among pitchers. If all pitchers were throwing BP fastballs down the middle, as in the HR derby, it is intuitvely obvious that all of the responsibility in the outcome can be attributed to the batter. The reason is that the spread of talent among the pitchers is zero.
Now, we know that the spread of talent among pitchers in almost (maybe all, I don’t recall OFTOMH) all of the components (HR, K, BB, etc.) is smaller than among batters. HR rate is a good example. Batters’ HR rates go from around 3 a year to 40 a year (true rates). Pitcher HR rates go from around 10 to 30 (again, true rates, not observed rates).
Anyway, when we assign 100% of the “responsiblity” to pitchers and to batters when we compile the NERV changes, the “real” relative rates of responsibility “come out in the wash” because we are always comparing or normalizing each player to the average of his group (batters or pitchers).
So in the example of the batting practice pitcher, in the long run (getting rid of random fluctuation) the sum of the NERV changes for all of the pitchers will automaitcally equal zero. IOW, all pitchers will be league average pitchers. If we have a league where there is some spread of pitcher skill, but not much - a lot less than batter skill - we will find that the spread of total NERV change among all the pitchers will again automatically be small - at least smaller than the spread of batter NERV change.
If I do it the traditional way, using 100% for batters and 100% for pitchers, I also get a larger spread for batters than for pitchers - IOW, the best batters will be better than the best pitchers.
So I don’t really get where these guys are coming from either, although I am pretty sure that at least one of them is smarter than I.
Maybe what we are missing is that if you use the batted ball types and characteristics (distance, speed, etc.) to determine the NERV change rather than the actual result of the play (s,d, etc.), you have to somehow use this 62/38 split or you won’t come up with the correct answer. I am not sure.