THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 06, 2008

Hardball Times Team Stats

By Tangotiger, 09:10 AM

I don’t know when THT rolled out Fangraphs WPA stats, but I like it.  They give you the team-level totals.  The Angels for example are a total of +6.5 wins (meaning their actual wins minus losses is 13 games… as you can see, the player performances are a perfect match to their teams’ record).  That breaks down as -2.0 wins for batting, +4.7 for starters, and +3.7 for relievers.

Fangraphs is likely using a run environment that is too high, meaning it gives too much credit to pitchers.  The total for the 30 MLB teams should have batting as exactly 0.0 and starters+relievers as exactly 0.0.  Checking now… the total of the 30 MLB teams is -51 wins on offense and obviously +51 on pitching.  So, until David A. updated the run environment charts he uses for the win expectancies, you need to mentally add +51 wins per 70,000 or so PA to your hitters (i.e., about +0.1 wins per 200 PA), and remove 51 wins per 16,000 or so IP to your pitchers (about -0.2 wins per 70IP).  As you can see, no big deal at the player level so far.

Also note that starters are way behind relievers, as they always are.  Relievers get the advantage in that their run environment is being compared against a fixed point (say 4.50 runs per game), when in fact relievers, because they are relievers, should actually be compared to a lower run environment.  But, the charts I provided David doesn’t allow for that to be handled.  I can do it, but it’s a pain in the butt. 

In any case, since you likely want to compare to replacement level, not average, you’ll have to make an adjustment anyway, so you might as well do it after-the-fact, and not in real-time.

Anyway, the reason I happen to discover THT’s WPA charts, is that Rally was talking about the Mariners fielding.  It seems that Safeco is always at the center of issues with fielding stats.  They’ve got close to the league-low in both infield and outfield fielding.  It’s hard to believe that a team with Beltre and Betancourt and Ichiro can be that dismal.  Either the other players are so dreadful as to bring them all down, or one or all of these guys aren’t as good as their reps.

At some point, I’ll be looking at Safeco’s PBP data to see if there’s something strange with their data.


#1    Tangotiger      (see all posts) 2008/06/06 (Fri) @ 09:35

If Fangraphs drops the run environment by around 0.25-0.30 runs per game, we’d likely be in-synch in terms of the off/def balance.

I know that David relies on the 2007 run environment as a proxy, until the 2008 season completes and he can do a rerun.  As I noted, no one’s going to notice a change at the individual level, so it doesn’t make much sense for him to constantly reupdate the run environment.


#2    David Cameron      (see all posts) 2008/06/06 (Fri) @ 10:49

Mariners defense in a nutshell:

Betancourt’s gotten fat and has an error problem
Beltre’s made a lot of errors already
Ibanez and Sexson can’t move
Lopez is below average
Wilkerson (and now Balentien) can’t read a fly ball

It’s legit - this defense is horrible.


#3    studes      (see all posts) 2008/06/06 (Fri) @ 11:49

I had the simple idea of incorporating team-level WPA stats into the Win Shares team splits, and using them to make the “pythagorean” adjustment to Win Shares.  That is, keep everything else the same (for now) but use use WPA instead of the runs scored/allowed variance to divvy up wins between offense and defense.

Unfortunately, this won’t work if the run environment isn’t set correctly.  And it should ideally be set correctly on a park level.  Any ideas about how to get around this?


#4    dkappelman      (see all posts) 2008/06/06 (Fri) @ 12:35

Dave, by the end of the month, 2008 will be park adjusted with a more accurate run environment.  Doing in season run environment is a little tricky since I can’t really rerun the whole thing every day. 

Also, by the end of today I plan to have 1974-2007 up and park adjusted (no 1999).


#5    Tangotiger      (see all posts) 2008/06/06 (Fri) @ 12:36

Well, I believe that David is adjusting at the park level (or at least, is planning to, for 2007 and earlier).

I’m guessing that you might be better off the other way, that Win Shares has the appropriate off/def split, and you can therefore adjust WPA from that split.


#6    studes      (see all posts) 2008/06/06 (Fri) @ 13:00

David, that’s great news.  I didn’t realize your WPA environments were set on the park level.  I take it you don’t plan to run WPA before 1974 cause the data is incomplete?  (If it matters, I’d vote for you to run them anyway.)

Tango, I don’t understand your comment.  When you say that Win Shares has the appropriate off/def split, what are you referring to?


#7    dkappelman      (see all posts) 2008/06/06 (Fri) @ 13:39

Dave, they’re currently not (league only), but I just finished rerunning everything with park adjustments so they will be today hopefully.  Just need to check a few things and re-upload the data.

I’d say the chances that we’ll eventually have prior to 1974 is quite high, but I’m not sure how soon it’s going to happen.  Maybe in the off-season.


#8    Tangotiger      (see all posts) 2008/06/06 (Fri) @ 13:54

Presuming I’m following Win Shares properly, if a team if top-heavy in pitching, that team will get more than the 35% split in Win Shares that the typical team gets.  I’m presuming that this can be done accurately, because Win Shares handles park adjustments in as an appropriate fashion as it can.

The WPA split between off/def is entirely dependent on using the right run environment to generate the win environment.  If you use a runs per game of 3.0, then a WPA model will severely shortchange pitchers, as that run environment will presume that any run allowed will be very costly.  So, a team that allows 4.5 runs per game will be continually in the negative.

Win Shares however, figures out the correct run environment (I presume) and therefore is able to allocate the correct split at the off/def level.  Pure assumption on my part.


#9    studes      (see all posts) 2008/06/06 (Fri) @ 15:02

Tango, the off/def split for the team is where the “pythagorean” split problem occurs.  Win Shares assumes that any differential between the pythagorean record and the actual record is allocated according to the same ratio that runs scored and allowed are.  I think many of us feel that this is the second biggest problem with Win Shares (the first being no Loss Shares).

At THT, we’ve already tackled the Loss Shares issue.  I think that using WPA for the off/def split appropriately handles the pythagorean split problem.  The key is using the right WPA for the given environment.


#10          (see all posts) 2008/06/06 (Fri) @ 19:48

Is there any specific reason that 1999 is excluded from a lot of studies and applications? If it’s missing data, why is that so?


#11    Rally      (see all posts) 2008/06/06 (Fri) @ 21:36

Retrosheet does not have data for 1999, but I think they are expecting to add 1999 sometime this year.

I have no doubt the Mariners team is a poor defensive one, it’s not a park illusion.  In this same park, the Mariners once posted incredible defensive numbers.

The current squad just doesn’t make the same plays they did when they had an OF with both Mike Cameron and Ichiro and Olerud and David Bell on the corners.

I’m convinced they are terrible, but just wonder where to give the bulk of the blame: my stats say Betencourt and my eyes say the problem is Lopez.  Both stats and eyes are repulsed by Ibanez in left.


#12    KJOK      (see all posts) 2008/06/09 (Mon) @ 16:51

IIRC Total Sports held the ‘rights’ to the 1999 play-by-play data somehow, and then they went bankrupt.  Not sure where that all stands anymore, but Retrosheet has been saying they’re going to have 1999 data for a while.


#13    SirKodiak      (see all posts) 2008/06/09 (Mon) @ 19:53

This article from THT by Sean Smith from March 25, 2008 says:

I will provide an updated spreadsheet with all revised ratings from 1956 to 2007, except for 1999, and the double play ratings. In addition, I am working with Sean Foreman to add these ratings to Baseball-reference.com. When the data are ready you can access these ratings from player pages, and also see the split data, such as home and away fielding. As another bonus, Sean has obtained play-by-play data for the 1999 season, so you will be able to see the ratings from that year on his site as well.

No idea what this means for retrosheet obtaining them though.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main