Thursday, March 01, 2012
Marcel 2012
This weekend.
Then I’m sure Fangraphs will have it integrated shortly thereafter.
Buy The Book from Amazon
This weekend.
Then I’m sure Fangraphs will have it integrated shortly thereafter.
Great study here. Go read it, because it came off a challenge I put out.
Now, we expect to see some difference, some tiny difference. It’s not like data from year T-4 is irrelevant. But, HOW MUCH does it mean is always the question.
Here is what Rob got:
1. He started with his two matched groups, and he had one group at 0.06 FIP lower than the other.
2. In the out-of-sample, we expected to have the same difference, but instead, the gap was now 0.19 FIP
That is, the gap was 0.13 more than expected.
Why is there a difference? Because in T-4, one group had a FIP of 2.75, and the other had 4.55 for a gap of 1.80 runs.
Therefore, knowing there is a 1.80 larger delta in T-4 will yield a 0.13 larger delta in year T. That is, you need to weight year T-4 at 0.13/1.80 = 7%
That is shockingly low. I was expecting an even higher weight. I use a decay rate of 30% each year.
So, in year T-1, I give a weight of 100%, year T-2, I give 70%, in year T-3, I give 50%, and year T-4, I give 35%. As you can see, year T-4 is 35/(100+70+50+35) = 14% of the total weight. (There’s also some regression which would bring it down.)
But perhaps a 40% decay rate makes more sense. In that case, the weights would be 100%, 60%, 36%, 22%. In that case, T-4 is 10% of all the data. Add in regression, and, well, that explains Rob’s findings.
Conclusion: year T-4 should be severely underweighted, and its weight is consistent with a 40% annual decay rate.
Great job Rob! I’ll put this as one of the top research pieces of the year already.
Normally, we can only do this analysis with trades. But, going from an almost certain 50-game suspension to a 0-game suspension is just what we needed. The oddsmakers put the Brewers from 35:1 to 25:1 after the news, which means that Ryan Braun is worth +.01 World Series over 50 games, or about +.03 World Series per season.
If you listen to the gasbags, talking about the Brewers going from “no chance at winning” to “great chance at competing”, you’d have thought that the odds would have moved far more than that. Of course, it’s very political, because if you want to say that Braun being out for 50 games, you frame it in a way about not winning the World Series. But the odds are low for that for almost every team anyway. Now, with him on the team, it’s more about just being able to have a chance at making the playoffs. Moving goalposts.
Anyway, if Braun is +.03 World Series, he’s probably close to +.24 playoffs.
I had a quick way to figure out the odds of making the playoffs:
PayrollIndex / 2 - 23
A league average team therefore would be 100/2-23 = 27% (which is the same as 8/30 naturally). So, Braun would change that to about 50% odds of making playoffs for Brewers (if he went from 162-games out to 0 games lost).
The 50% odds using the above formula would mean a team has a PayrollIndex of 146. So, a team payroll of 90MM$ has a 27% chance of making the playoffs, and a team payroll of 130MM$ has a 50% chance of making the playoffs. Braun is being valued at 40MM$ of equivalent payroll.
This is all rough back-of-envelope calculations. This quick way would suggest that Braun should haev changed the odds to 30:1 from 35:1, not all the way down to 25:1. (Or, they should have originally been at 30:1, not 35:1.)
Would love to see MGL or VegasWatch or someone run through a sim, and see what the odds look like with Braun missing 0, 50, or 162 games.
Glove-slap Vegas Watch.
One SD is 8.0 wins, which is right where it should be.
95.5 Phila.
94 Detroit
94 Texas
93 New York
89.5 Los Angeles
87.5 Tampa Bay
87.5 Boston
87.5 San Fran.
87 Cincinnati
87 St. Louis
85.5 Atlanta
84.5 Arizona
82.5 Miami
81.5 Toronto
81.5 Milwaukee
81.5 Colorado
81.5 Los Angeles
81 Wash.
78.5 Kansas City
77.5 Chicago
75.5 Cleveland
74.5 New York
74 Minnesota
73.5 Chicago
73 Pittsburgh
73 Oakland
72.5 Seattle
71 Baltimore
70.5 San Diego
62.5 Houston
In the THT 2012 Annual, Matt Swartz wrote a very good article on the different valuations for free agents, depending on whether a team resigns the player, or another team signs the player.
We had a short thread on that article:
http://www.insidethebook.com/ee/index.php/site/comments/other_peoples_players/
One of Matt’s theses is that teams know more about the own players such that if a team trades a player or lets him walk (I think Matt only looked at FA multi-year signings) he tends to do worse in the future than if a team keeps a player. That makes sense of course. And other teams overvalue these players and consequently pay them too much, or at least more than the teams that keeps their own players.
I think that Matt might have found a delayed effect, but I am not sure (he can correct me if my recollection is wrong). In other words, the poor performance did not show up until 1 or 2 years down the road, suggesting the team’s “inside information” has to do with the player’s aging curve (perhaps because of a player’s mental or physical traits or habits that a team is privy to).
He also found that the effect was much more pronounced in pitchers than in hitters.
Anyway, I looked at this phenomenon for players that changed teams and/or leagues. Any change, not just FA signings. And I only looked at performance in the year before the switch (or not - I also looked at players who did not switch teams, as a control group), year1, the year after the switch, year2, and the year after that, year3.
For batters, I used linear weights or runs per 500 PA. For pitchers, I used component ERA, ERC, where 4.00 was “forced” to be each league’s (NL and AL) average in each year (basically my ERC is 4.00 + runs allowed in linear weights per 9 innings). As well, batter runs per 500 always summed to zero for each league in each year.
For both batters and pitchers I computed a rough Marcel projection based on the 3 years prior to year 1 (the year before a player switched teams or not). This is the projection that I compared year2 performance with. The projection is for year2. Matt’s thesis would say that for players who switched teams, they under-performed their Marcel and likely the players that did not switch slightly over-performed (since Marcel should be correct overall).
First the batters
Same team
The were .6 runs (per 500 PA) better in year2 than their projection. However in year3, the year after the “non-switch,” they were basically right around their projection, after adjusting for their new age. Average age of these players in year2 was 29.2.
Same league, different team
The were 1 run worse in year2 than their projection. In year3, the year after the “switch,” they continued to be a little worse than their original projection, after adjusted for their new age in year3. Average age of these players in year2 was 31.7.
Different league, different team
The were 1.4 runs worse in year2 than their projection, worse than the “same league, different team” players, perhaps because they also had to face unfamiliar pitchers. In year3, the year after the “switch,” they continued to be worse than their original projection, after adjusted for their new age in year3. Average age of these players in year2 was 31.4.
The pitchers
Same team
They outperformed their projections slightly, by around .05 runs per game. In year3, they had normal aging, around .15 runs worse. Average age in year2 was 28.3.
Same league, different team
They performed around .1 runs worse than projected in year2. The next year, year3, they did terrible! They were .47 runs worse than in year2! This might be the large effect that Matt found with the pitchers. Average age was 31.3.
Different league
They actually performed better than expected, about the same as the pitchers who didn’t switch teams. The advantage from switching leagues, their opponents being unfamiliar with them, probably masked the same worse than expected performance we see in pitchers who switch teams but not leagues. If we look at year3, one year after the switch, they actually lose .26 runs rather than the .15 that the “same team” pitchers lose, so the edge from switching leagues seems to disappear, as might be expected. Average age in year2 of this group is 30.8.
Interestingly, if we look at year4, 2 years after the switch (or not switch) the pitchers that stayed with their own teams did not do any worse than in year3 (actually this is not the same pool of pitchers, so only use it for comparison purposes), but for the pitchers who did switch teams, including those that switched leagues, they continued to do worse. The pitchers who switched teams and leagues lost another .24 runs in year4, suggesting that they continued to lose the advantage of switching leagues.
Anyway, it looks like Matt’s thesis is correct!
So there you have it…
In this great article by Mike Fast in BP a few months ago, he described a method by which he estimated catcher framing performance using Pitch f/x data. He was generous enough to provide a complete database for all catchers in 07-11.
From those numbers I computed an estimate of each catcher’s framing true talent by simply taking his total observed numbers and regressing toward the mean (zero) by adding 4500 called pitches (about 75 called pitches per game, BTW) of league average framing (zero of course), as he suggests in the article. I did not do any weighting by year, age adjustments or anything like that. I just used the 4 year combined numbers that Mike provided. (BTW, I later learned that there was an error in Mike’s computations, so I multiplied his run values by .65, as per Mike).
To test his numbers, I first broke the list of catchers and their true talent framing skill into two groups of around 25 players each (an arbitrary number of players in each group) - the best and the worst. The average framing skill in the best group, weighted by the number of PA they caught in 07-11, was +7.5 runs per 150 games, and for the worst group, it was -7.7. That is around a .05 runs per game influence, which would show up in their pitcher’s ERA, RA9, or ERC (component ERA). Only a part of that would show up in DIPS or FIP, since framing also influences BABIP.
Anyway, to test his number, I did a WOWY on those catchers. I looked at the results of all pitchers they caught when they were in the game and when they were not. I did not control for anything else, like park, batters, H/A, etc. A pretty standard WOWY analysis. We can thank Tango for that, BTW. I then looked at the WOWY differences in wOBA, SO, and BB rates.
I looked at 05-11 for some reason rather than just 07-11. So I used some in-sample data (07-11) and some out-of-sample data (05-06). The average catcher in the “good framing group” this time pro-rated to the number of PA they caught in 05-11 (rather than just 07-11) was +7.3 and for the “bad framing group”, -7.6, around the same as for 07-11. IOW, also around .05 runs per game.
Here are the results:
The good framing group had a wOBA difference of .008 points. IOW, looking at the same pitchers, when the good framing catchers caught them they allowed a wOBA of 8 points less than when some other catcher (a slightly bad framing catcher, on the average) caught them. That translates to around .24 runs per game - a lot more than we expected. The BB per PA had a .004 difference (around .15 fewer BB per game) and the K was .003/PA (.11 per game) more.
For the bad framing catchers, they had a .003 higher wOBA, or .09 runs per game, .11/game more BB, and .23/game fewer K. The runs per game number is also more than we expected.
However, we expect to find much more of a WOWY effect in the in-sample data than is expected using the regressed in-sample framing data, because the actual framing performance of these good and bad framing catchers was much more spread out than the estimated true talent numbers (the regressed performance).
The total number of “min” PA were 302, 434 for the bad framers and 88,738 for the good framers. So the standard error in wOBA is around 1.7 points for the good framers and .9 points for the bad framers. (That is not exactly how you do a standard error for a WOWY; in fact, the real SE’s might be almost double since a WOWY is a difference between two numbers.)
Now, this is not such a great test because most of the data is in-sample (07-11). IOW, in the WOWY test, I used the same data that Mike used to come up with his catcher framing numbers. While he did not use the same method at all (WOWY), it is possible that there are some dependency issues.
The best way to test his numbers is to use out of sample data (and hope that the catchers had around the same skill that they had with the in-sample data).
So first I only used Mike’s data from 07-09 (and did the appropriate regression of course) and then I did a WOWY from 05-06, and 10-11 (4 years).
The average catcher in the bad framing group (based on only 07-09 framing numbers), prorated by the number of PA they caught in 05-06 and 10-11, was -8.9 per 150, and in the good group, +8.1. That is around .057 runs per game.
Here are the results of the out-of-sample WOWY. These numbers should be close to (rather than larger) the true talent estimates, unlike the in-sample numbers.
Bad framers
wOBA diff: .09 runs/game
BB diff: .114 BB/game
K diff: .19/game
Good framers
wOBA diff: .03 runs/game
BB diff: .076 BB/game
K diff: .114/game
These numbers combined, (.03 + .09)/2, or .06 runs per game, are exactly in line with what we would expect from Mike’s numbers, which is very comforting. In fact, I love it!
Later today, I will do the same test on Max Marchi’s numbers, which were also derived from the pitch f/x data, but use a different method I think…
Some interesting points by Matt, when dealing with forecasting playing time.
If he can manage it, I’d like to see him include the Community forecasts:
http://tangotiger.net/survey/
(You can click “all players” at the bottom to get everyone, and of course, there’s the handy MLBAM Id for easy matching.)
This was an article I wrote at least ten years ago, if not older. But, still relevant, and perhaps it’ll inspire others to do something as well.
Clay gives a bit of detail on his forecasting system. As expected, K rates get far more weight in the recent seasons, while hit rates don’t.
***
I’d like to hear more about the “comps”, and the kind of adjustments he applies. Like I’ve said before when talking about PECOTA, you have to be pretty lucky to get the mid-20s Andruw Jones but very unlucky to get the early-30s Andruw Jones in your comp set.
Basically, WHY do we think that using comp players is necessarily a good thing? This is like using Line Drive data, just because it happens to be recorded by someone. So, you’ve comp-ed someone. Why does this necessarily help?
What it does do is give the forecast “a face”, but beyond that, I’m skeptical it has value. Indeed, no one has yet shown it does have value. And if it does have value, why not create an algorithm, rather than rely on the small number of comps and their observed performances?
This was quite a surprising claim from Colin:
These values are on a very different scale, since due to the lack of an intercept the values have to sum to one for the first regression and to three for the second regression, but they’re also very different in a more meaningful sense; recasting the first year to 1 (which is practically already done for us), we get weights of 1/.92/.90.
As you know, Marcel uses 5/4/3 for hitters (meaning 1, .8, .6) and 3/2/1 for pitchers (1, .67, .5). (I think it was 3/2/1… can’t confirm right now.)
I personally use .9994^daysAgo for hitters and .9990^daysAgo for pitchers, which has the effect of being 1, .8, .64, .512 (and so on, each 80% of the previous) for hitters, and 1, .7, .49 (and so on, each 70% of the previous) for pitchers.
Tests from other research makes me think that it should be even more aggressive, so maybe 1, .7, .5 for hitters and 1, .5, .25 for pitchers. But, I haven’t researched that, so, I’ll just leave it there for now.
Colin has gone way to the other side, essentially going with a .9998 or .9999^daysAgo kind of model.
Now, I agree with the framework for his testing, that you should and must include the PA component when establishing the weights. Frankly, this is an important step. When I did it for Marcel, I basically forced everyone in the system to have at least 300 PA, so that I didn’t have to worry about this portion too much (I should have worried a little about this, at least). Indeed, if you give everyone at least 500 PA for each of the three years, this step becomes basically unimportant (no worries at all). That’s because the weighting of each year (the PA of year 1 divided by the PA of years 1 + 2 + 3) will be the same for each wOBA of year 1, 2, and 3.
So, getting back to Colin’s important point: he’s saying that if you introduce the PA weighting component, we see that every year is important. I find this very hard to believe. I mean, it’s an exciting finding if true, and I’d like to see more research on this for sure. My guess at the moment is that there’s a selection bias issue, with guys of limited number of years, or for young guys.
Basically, does Colin’s finding apply across-the-board, or is it really limited to a subset of the population? I’d bet on the latter, and I’d bet that the Marcel 5/4/3 would still hold for players who are regulars. In any case, it’s an exciting prospect to consider.
***
A correction to Colin’s note here:
The third, and perhaps most important, takeaway has to do with regression to the mean. We can add a simplistic version of regression to the mean to our forecasting model by adding a TAv_REG of .260 (the league average) with a PA_REG of 1200. (The PA_REG comes from the Marcels; it’s included here mostly for the purposes of illustration. The regression component in PECOTA is a more rigorous model based on random binomial variance—again, the purpose here is only to illustrate the concepts.
Consider a player with 650 PAs in three straight seasons, or 1950 total PA. Using the Marcel weighting of 1/.8/.6, that comes out to 1560 effective PA— in other words, throwing out 20 percent of a player’s PAs during that time period. That means 56 percent of a player’s forecast comes from his own performance, and 44 percent comes from the regression to the mean component. Using weights of 1/.92/.90 yields 1833 effective PA, throwing out only about six percent. Using the same regression component, that’s 60 percent of a player’s forecast coming from his own production and only 40 percent coming from regression to the mean. (And if you follow from the conclusions above and start using more years to forecast a player as well, even less regression to the mean is necessary.)
There’s a calculation error in there. Marcel uses 5/4/3/2 model, with the 5/4/3 being the weights for years T, T-1, T-2, and the 2 being the weight for regression toward the mean (using 600 PA as the seasonal number). So, if you had say 700 PA in year T, 400 in year T-1, and 500 in year T-2, you get these effective weights:
year T: 700 x 5
year T-1: 400 x 4
year T-2: 500 x 3
regression: 600 x 2
That 600x2 is the same for everyone. Colin’s calculation error is that rather than using 5/4/3, he used 1/.8/.6. The net effect is that he showing a far bigger regression amount than Marcel is actually doing.
Brian doesn’t blindly follow his off-the-wall forecast. Good for him.
I looked at the last 10 firstbaseman (born not later than 1973) to have had at least 10 WAR from age 25-27. These 10 players averaged 13.8 WAR, right in line with Prince’s 14:
WAR Born Player
17.7 1973 Todd Helton
10.9 1973 Mike Sweeney
14.8 1970 Jim Thome
19.4 1968 Jeff Bagwell
18.3 1968 Frank Thomas
17.3 1964 Will Clark
13.2 1964 Rafael Palmeiro
15.2 1963 Fred McGriff
10.9 1963 Mark McGwire
10.1 1963 Cecil Fielder
I got a chuckle at #10 on the list.
Anyway, how did these guys do over the next 9 seasons? I added a column to the above chart called “WAR9”, which is the number of WAR from age 28-36:
WAR9 WAR Born Player
37.2 17.7 1973 Todd Helton
10.6 10.9 1973 Mike Sweeney
40.5 14.8 1970 Jim Thome
50.6 19.4 1968 Jeff Bagwell
33.9 18.3 1968 Frank Thomas
26.0 17.3 1964 Will Clark
41.9 13.2 1964 Rafael Palmeiro
23.1 15.2 1963 Fred McGriff
43.9 10.9 1963 Mark McGwire
4.8 10.1 1963 Cecil Fielder
The average is 31 WAR. If we start a player at 4.8 WAR, and gradually accelerate his aging, we get this kind of aging chart, along with the cost per win (starting at 5MM$ per win, and increasing at 5% each year):
WAR $perW Value
4.8 $5.00 $24.0
4.7 $5.25 $24.7
4.5 $5.51 $24.8
4.2 $5.79 $24.3
3.8 $6.08 $23.1
3.3 $6.38 $21.1
2.7 $6.70 $18.1
2.0 $7.04 $14.1
1.2 $7.39 $8.9
The total comes in at 9 years, 183MM$.
If we take out Cecil Fielder (for whatever reason you want), the other 9 comps average out 34.2 wins, and that would work out to 201MM$.
So, we can create some reasonable scenario where the overpay is some 13MM$ to 33MM$, rather than the 50-100MM$ being discussed.
Anyone going to step up? Anyone? The hard part is collecting all the data, and matching all the players. If someone ELSE does all that hard work, I can step in and do the rest.
The test is pretty simple.
1. Calculate wOBA for every forecast, and for the actual. I’ll do something simple like
numerator = 0.7*BB + 0.9*1B + 1.3*(2B+3B) + 2*HR
denominator = BB+AB
It really doesn’t matter much what you do here. You just need something that focuses on the important stats, and make sure everyone forecasted those stats.
2. Calculate each population mean, by weighting by actual PA (AB+BB). For missing players, either give them the population mean (you HAVE to do this for Marcel, since by definition, Marcel has no missing players), or set the wOBA at 20 points below the rest of the population mean.
3. Recalculate new population mean (where applicable).
4. Baseline each player to a common mean (set to .330, but it doesn’t really matter what you set it to). So, if the pop mean in #3 is .327, and you have a player forecasted for .377, his adjusted forecast is .380.
5. Calculate the difference for every player.
6. Present the average absolute difference, and the RMSE, and in both cases, weighted by the actual PA.
That’s it, that’s the basis.
Then you can do fun stuff, like splitting by career performances. Guys with 1500 or more PA in the last 3 years, guys with fewer than 250 PA, guys who had a .380+ wOBA in the last three years, shortstops, etc, etc. Look for whatever attribute of a player you want. And compare the systems, and look for bias.
Bueller?
This time, it’s Clay’s turn.
I’m playing with my son right now, so I can’t comment much. I see Rangers have the lowest runs allowed in the AL, which would seem hard to do in that park. And the spread in wins seems wide. Can someone calculate the standard deviation of wins? A good forecast should have one SD = 9 or so.
That’s the implication here. But, seeing that the quality of pitchers in the starters group FAR exceeds that of the relievers group, it wouldn’t necessarily be a starter/relief thing, but a good/bad pitcher thing.
I’ve mentioned in the past that it’s more likely that good players don’t age as fast or peak as early as bad players. That’s part of what makes them good. Plenty of players peak at age 21, and chances are, they weren’t that good.
Matt shows this:
Estimator(N=1,576 pitchers) RMSE of Statistic with Next Year’s ERA(2006-2011)
SIERA 1.126
Marcel 1.132
PECOTA 1.141
ZiPS 1.143
xFIP 1.148
FIP 1.212
tERA 1.236
ERA 1.387
First of all, no need to go to three decimal places. We show ERA as two points, so why bother showing it to three decimal places? As far as I’m concerned, there’s virtually no difference among the top five.
Secondly, I can’t tell if the future ERA is park-adjusted or not. It MUST be unadjusted. NO ONE is trying to estimate a pitcher’s park-neutral ERA in terms of testing. The only test is how he actually did. So, we don’t adjust for park and strength of schedule and innings per start.
(MGL for example only cares about park-neutral. And that’s fine. But then, we can’t test his results. SIERA is park-neutral I think, but FIP is not. All the forecasting systems in fact are park-specific. You can’t turn everything to park-neutral first.)
You COULD make the case that we should throw out any unexpected starter-relief switches, for reasons we’ve learned about over the years. But, we need to be careful here, as we may end up with a selection bias.
In the comments, Matt notes that it was park-adjusted. Again, I completely disagree here. The test is against actual performance, not adjusted performance. He notes it didn’t make a difference. Well, given that the test is slanted toward SIERA, and SIERA is Matt’s baby, then, I’d REALLY like to see the results the right way.
Now, Matt may decide to introduce a park-specific SIERA, so that we can all make the apples-to-apples comparison. Until then, SIERA will simply have to have its hand tied behind its back.
Thirdly, for the RMSE test, you MUST calibrate it so the league average for the forecast equals the league average for the actuals. It should be clear that if you treat the forecasting system as its own universe, it’s irrelevant if the expected ERA was set to 3.9 or 4.3 and the actual ended up at 3.7 or 4.8 or whatever. I’m not sure if Matt handled this.
As we know, RMSE, not correlation, is the correct test.
Having said all that: great job to Matt!
I ran four competitions, three unofficial, and one official. I’ll run them all down. I’m going to list the results of all the pro forecasters who finished ahead of Marcel. For those that finished below Marcel, I will list them in alphabetical order.
May 16 22:50
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16 20:44
How to beat the shift
May 16 20:02
Sponsoring MLB jerseys
May 16 19:34
Now you frame it, now you don’t
May 16 16:56
Did Manny Pacquaio actually quote Leviticus?
May 16 16:06
Does changing your pitch frequency lead to substantial change in results?
May 16 14:18
Extra Innings: One-minute review
May 16 14:16
This particular criticism of UZR is unfounded
May 16 13:21
Psst… wanna intern for the Astros?
May 16 12:23
Arena wars
THREADS
May 16, 2012
Now you frame it, now you don’t
May 16, 2012
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16, 2012
Does changing your pitch frequency lead to substantial change in results?
May 16, 2012
Sponsoring MLB jerseys
May 15, 2012
Andre The Hawk Dawson speaks
May 15, 2012
Euro 2012 Preview
May 15, 2012
How to beat the shift
May 15, 2012
Will Pujols end the season with at least 30 HR and .500 SLG?
May 15, 2012
Kershaw v Strasburg, part 2
May 15, 2012
Did Manny Pacquaio actually quote Leviticus?
Recent comments
Older comments
Page 2 of 342 pages < 1 2 3 4 > Last »Complete Archive – By Category
Complete Archive – By Date