THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, November 11, 2008

Best performances of 2008

By Tangotiger, 10:21 AM

Sky gives us his list.  It is probably closest to how I would have done it.  Personally, I would have included Chone’s and the Fans Scouting Report for fielding.  It looks like the “Off” column is a combination of offense and replacement level, with offense including park adjustments.  Here’s what might be better:
Hit Run Park League Repl Pos Fld

This way, if someone disagrees with the Park valuation, he can easily change that, without throwing everything out.  As it stands, it’s not readily apparent what the park adjustment is for Holliday or Mauer.  How much did Beltran get for his running game?  Don’t like it?  Change that part, leave the rest as-is.

Also, I’m not sure if the Off uses the correct run value for the out.  Since Sky is a regular here, I’ll presume he did it right.

Those nitpicky things aside, a great presentation, and an excellent series overall.


#1    Rally      (see all posts) 2008/11/11 (Tue) @ 10:54

This is good.  Even I wouldn’t use my numbers for fielding at least for this purpose - I’ve published fielding projections but Sky is looking just at what they did in 2008.  And right now, my fielding projections use the same input he has, average zone ratings, for 2008.


#2    Sean      (see all posts) 2008/11/11 (Tue) @ 12:52

Aren’t these just Justin’s Total Value rankings, which had been available for awhile now?  It’s cool that Sky is bringing it to probably a larger audience though.


#3    Andy L      (see all posts) 2008/11/11 (Tue) @ 14:28

Best performance of 2008? Barack Obama FTW!


#4    Sky      (see all posts) 2008/11/11 (Tue) @ 16:05

Thanks for the link and kind words, Tango.

The numbers are from Justin’s site, available to anyone.  We worked out the methodology last fall based on the work/advice of Tango, MGL, Patriot, Sean, and many others.  I’m glad Justin’s the one publishing the numbers for two reasons: it saved me the work and I can pimp them all I want without seeming self-serving.

Good idea on splitting up the data even more, Tango.  And I may eventually work in John Walsh’s 2008 OF arm ratings.  Is there any good work being done to rate abilities of turning double-plays?


#5    Sky      (see all posts) 2008/11/11 (Tue) @ 16:12

Also, does anyone have a good (either thorough or extremely concise) explanation for why position adjustments should be done by relative defensive value instead of offensive positional averages?  That’s one part of these stats that everyone questions, and I’d like to point readers somewhere instead of typing up something that’s less than convincing every time.


#6    Colin Wyers      (see all posts) 2008/11/11 (Tue) @ 16:26

Here’s why.

In 1961, center fielders as a group hit .273/.351/.434, compared to .271/.341/.440 for left fielders. Do we think that left fielders are more valuable than center fielders for that year, given the same defensive contribution relative to position? Not at all.

The average defensive left fielder is less valuable than the average defensive center fielder, given the same offensive production, because we know - know! - that the average defensive CFer can go out there and be an above-average defensive left fielder. That doesn’t change simply because this year Willie Mays, Mickey Mantle and Hank Aaron are all playing center field and are around the offensive peaks.


#7    Tangotiger      (see all posts) 2008/11/11 (Tue) @ 16:29

Because if you do it by offensive positional average, then by default, you are forcing the overall value of the average LF to match the overall value of the average RF to match the overall value of the average 2B to match the overall value of the average SS, and so on.

This is hardly true.


#8    Tangotiger      (see all posts) 2008/11/11 (Tue) @ 16:31

Or what Colin said.

And in the late 40s or early 50s, the average CF was a better hitter than the average 1B.


#9          (see all posts) 2008/11/11 (Tue) @ 20:30

Sky--

Doesn’t the fielding bible rate double play turning ability as one of its categories? I don’t subscribe to BJO and haven’t bought the book, but I seem to recall them tracking it.


#10    Sky      (see all posts) 2008/11/11 (Tue) @ 20:41

Dan, I don’t know.  I’m guessing not, though.


#11    Jeremy      (see all posts) 2008/11/11 (Tue) @ 23:25

Sky, it does track double plays for middle infielders, bunts for corner infielders, and arm for outfielders. Also, do you include baserunning, such as Baseball Prospectus’s EQBRR or BJO’s net gain? Just trying to think of ways to improve the ratings.


#12    Sky      (see all posts) 2008/11/11 (Tue) @ 23:56

Jeremy, really?  That’s fantastic.  And something more people should be aware of.

Yes, there are many additional things that would be added, ideally.  With my tech skills, play-by-play isn’t an option, unless it’s done by others.


#13          (see all posts) 2008/11/12 (Wed) @ 00:34

Ha! wink


#14    jinaz      (see all posts) 2008/11/12 (Wed) @ 00:48

Just thought I’d say thanks to Sky for running this series.  I had planned to do something similar for a while and publish it at THT, but I’m still buried by work and such.  Sky’s series gave the data a proper vetting (and good visibility to boot).  And he did a nice job of pointing out places where the data were likely misleading or prone to misinterpretation.

One thing I will say in response to Tango’s comments: part of the reason for only showing the columns we did was that further detail (e.g. showing park factors, league adjustments, etc) is that it might blind people with numbers.  Keeping it as simple as possible—offense + fielding + posadj—helps keep this approachable.  If anyone wants the spreadsheet so they can pick apart the methods, I have no problem sharing it (with minor conditions).  Just fire me an e-mail.  Methods were also laid out on my blog in my player value series last offseason, and I’ve tried to stick to those as much as possible.  They’re built upon Patriot’s and Tango’s work, mostly.
-j


#15    Jeremy      (see all posts) 2008/11/12 (Wed) @ 01:16

Sky, also, if you’re going to do this for pitchers, I would suggest pRAA at statcorner, as, from what I can tell, it does account for the “probabilities of their batted balls, before knowing whether they’re turned into outs or not, with park adjustments,” and you can come up with your own replacement levels/leverages since it’s measured in runs.


#16    MGL      (see all posts) 2008/11/12 (Wed) @ 02:20

"Pos” are positional adjustments?  Are there all kinds of numbers even for players who play the same position because of their playing time, or because they may have played multiple positions, or both?  IOW, if a player played multiple positions in the field, you include and prorate them all to come up with “pos”?

One minor thing. It would be nice to put each player’s primary or all of their defensive positions in the tables.


#17    jinaz      (see all posts) 2008/11/12 (Wed) @ 11:37

@MGL, yes, each position at which a player plays is pro-rated for the position adjustments, based on innings.  This unfortunately has the exception of DH’s, for which I don’t even get games played as DH info from my main source (Hardball Times data pages).  This messes up a few guys, like Jack Cust, pretty severely.  I don’t have a good way around it aside from manual intervention.

But if a guy spends 100 innings at 1B, 300 innings at SS, and 500 innings at 2B, those position adjustments will be assigned based on innings each position and then summed up to get the overall position adjustment you see displayed in those tables.

Also, I do list primary position on the original data tables that Sky is pulling from.  Click on my name to get to that spreadsheet.  I probably could include all the other positions, but that’s a relatively low priority item for me compared to some of the other adjustments I want to make.  And given that it’s all excel based, it might be a bit of a pain to get automated (though I have some idea about how to do it...).
-j


#18    Sky      (see all posts) 2008/11/12 (Wed) @ 16:40

Tango, just noticed your question of whether the correct value of the out was used.  I don’t know the answer to that, but I’m curious why you asked?  Justin used team-level BaseRuns to derive the linear weights, fyi.


#19    jinaz      (see all posts) 2008/11/12 (Wed) @ 17:15

Methods:

I use Patriot’s base runs spreadsheet to force a base runs equation that uses Tango’s coefficients (Base runs explained) to match 2003-2007 MLB data (it adjusts the B terms up or down to match the numbers).  Those equations are directly used for pitchers.  For hitters, I extract linear weights from that base runs equation (again, using Patriot’s spreadsheet, which uses calculus instead of the +1 method to do this). 

I actually don’t do team-level linear weights, because a) I’m interested in comparing players across teams and therefore it makes more sense to me to use league-level linear weights, and b) it just doesn’t make enough of a difference to be worth the massive increased complexity in the spreadsheet to have team-dependent linear weights.  We’re talking fractions of runs, at worst.

The resulting linear weights typically have larger values for individual offensive events (single = 0.51 runs instead of 0.46), and larger values for outs (-0.099 runs vs. -0.098...I’m using lwts_RC) than are reported in Tango’s base runs table.  I’ve always figured this was due to some combination of a higher runs environment over the past several years (2003-2007) as well as the fact that I’m not including all the terms in my model that Tango does (THT doesn’t report ROE’s, for example), and so the missing terms’ effects are generated via up-weighting correlated offensive events). 

The reason I don’t go with B-Ref’s data, which does report ROE’s, is that THT’s fielding data is nicely detailed (in addition to being the only source of RZR data) and is easier to pull into a spreadsheet than B-ref’s.  And so I use their hitting data because the names all match up perfectly.  The only source I use other than THT is ESPN for the ZR data, and there are occasional issues with names not matching up between the two datasets.
-j


#20    Tangotiger      (see all posts) 2008/11/12 (Wed) @ 17:30

Justin:

If one player has 100 BsR with 400 outs in 600 PA and another has 100 BsR with 350 outs in 600 PA, what are their resulting “offense” numbers?


#21    terpsfan101      (see all posts) 2008/11/12 (Wed) @ 17:35

The values in Patriot’s Baseruns spreadsheet are a little bit high, so it’s no suprise that you would get results like .51 for a single. In the future, you should count SF’s and GIDP’s as regular batting outs. The SF is a worthless accounting category. GIDP’s shouldn’t be included seperately unless you adjust for GIDP opportunities. Do you use seperate Baseruns equations for each league? It would be a good idea to use seperate equations.

Overall, your methodology is much better than Chris Dial’s total player rating system. At least you use a good run-estimator.


#22    terpsfan101      (see all posts) 2008/11/12 (Wed) @ 17:57

I meant that you should generate seperate sets of Linear Weights for each league.

Good question Tango. What is the correct way to convert absoulte runs into runs above-average?


#23    jinaz      (see all posts) 2008/11/12 (Wed) @ 18:04

@terpsfan, I’m using Patriot’s spreadsheet but not his base runs equation.  I used Tango’s base runs equation (variables culled a bit to match my dataset), plugged it into Patriot’s spreadsheet, and then forced the base runs equation to match 2003-2007 MLB totals.  The main reason I use his spreadsheet is that it makes extracting the linear weights from the base runs equation automatic.

I do not include SF’s except to increase the number of batted outs above what AB’s would suggest.  GDP’s are included, though I see your point on opportunities...I’ll think on it, though I like knowing that additional outs were created.  That would seem important to getting league-wide numbers to match up to reality.

As for different equations for different leagues, while it may be “better,” in my experience it doesn’t matter much.  It’s sort of like using team-specific linear weights--there are minor differences in terms of the actual results.  And, when I was setting this up, I judged that it just wasn’t worth the effort to use different equations for different leagues.  I do, however, have an adjustment to replacement level depending on league.

One can argue on these same lines that I shouldn’t bother doing custom linear weights at all.  But given that I’m using a different dataset than Tango used, and especially that I have different variables and perhaps a slightly different model (again, not including things like ROE’s and such), I thought it was worth doing.  Besides, as an intellectual exercise, it was good for me to learn how to do it so that I understood what was going on under the hood.

The specific equations I’m using are laid out in the post linked in my name, if you’re interested. As far as I can remember, those are the equations I currently am using in my spreadsheet.

@Tango, I’m not sure what you mean by offensive numbers.  You mean lwts?  r/g?  RAR?  I’m guessing you mean RAR, because that’s what Sky described as offense?  In any case, I’ll have to get back to you on it because I need to run.  The equations I’m using are in the link, though, if you want to see what I’m doing.

Thanks,
Justin


#24    terpsfan101      (see all posts) 2008/11/12 (Wed) @ 18:21

I stupidly asked: “What is the correct way to convert absoulte runs into runs above-average?”

Never mind, you are using RAR not RAA. I confused your metric with Chris Dial’s.

Jinaz, I certainly won’t argue with your inclusion of the GIDP. By itself, I don’t think it is a “context-neutral” statistic. Then again, the IBB, SH are not context-neutral categories either, and I include them in my run-estimators.


#25    Colin Wyers      (see all posts) 2008/11/12 (Wed) @ 18:41

For everyone’s benefit (or at least my tenuous sanity) here’s the last thread we had on converting between absolute runs and runs above average:

http://www.insidethebook.com/ee/index.php/site/comments/reconciling_linear_weights_and_runs_created/#comments

For the sake of brevity I’ll try and say something that (I hope) won’t get me killed, that should summarize the issue:

Whatever you reconcile your linear weights by is what you need to measure playing time with. If you are measuring playing time by PAs then your LWTS need to be reconciled at the PA level (which should give an out value of around -.14 for the modern offensive era).

It looks like jinaz reconciled by out instead. This is probably not a huge deal, but it is an area for potential improvement.


#26    jinaz      (see all posts) 2008/11/12 (Wed) @ 21:32

Frankly, my brain’s not up to that stuff tonight, so I might be missing something.  I’ll try to find time to read through the other thread tomorrow.

But I used outs as my denominator when calculating runs per game.  I then compare a player’s runs per game to a baseline (average or some fraction of average for replacement level--it varies by league) to calculate RAA or RAR.

I use outs because I was using absolute linear weights and not relative linear weights.  They do not include the full impact of outs, and therefore it’s important to measure playing time in terms of outs instead of PA’s.  I’m not sure if that’s what you’re referring to or not, Colin, because I didn’t keep up with that prior discussion.  But I was under the impression that this was the correct methodology...PA’s are never actually used in my estimations of R/G, RAA, or RAR.

So in tango’s question above, 100 absolute BsR (as I calculate them) in 400 outs would result in 100 / 400 * 26.25 outs/g = 6.6 runs per game.  And 100 absolute BsR in 350 outs would result in 100 / 350 * 26.25 outs/g = 7.5 runs per game.

To convert those to RAR, I do this:

(6.6 r/g - K * Lg) / 26.25 * outs = RAR

where K = your replacement level coefficient (I use 0.72 for the AL and 0.77 for the NL IIRC) and Lg = league average runs per game.  For Lg, I use league average for all position players rather than overall r/g because I don’t like to compare my position player hitters to a group that includes pitcher hitters.
-j


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com