THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, February 08, 2010

Pythag records

By Tangotiger, 11:28 PM

Rich Rifkin:

In his one season at the helm in Seattle, his club had a P-record of 75-87. The three years prior the Mariners had P-records of 67-95 (’08), 79-83 (’07) and 78-84 (’06). In other words, of Seattle’s last four years, the team Zduriencik put on the field was more-less the same as the ballclubs Billy Bavasi brandished.

Why not try the pythag record based on bases and outs (for, allowed) rather than runs?  After all, we prefer looking at individual OBP and SLG, rather than runs, rbis and ERA, don’t we?  So why would that then switch to preferring looking at RS and RA at the team level?


#1    David Cameron      (see all posts) 2010/02/09 (Tue) @ 03:38

I’ve been beating this drum for a couple of years now.  Pythag is a lazy shortcut that pretends to provide analysis, but in general, is just a crutch for people to stand on when they don’t want to really do the work.  It works enough that it’s caught on, unfortunately, but doesn’t work often enough that it shouldn’t be relied on for any kind of real conclusion.


#2    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 08:29

Just to make the point: if you look at OBP, SLG, SB, etc (i.e., component-runs scored and component-runs allowed), the 2009 Mariners are at least a .500 team.


#3    Charles Saeger      (see all posts) 2010/02/09 (Tue) @ 10:57

Perhaps one should try it for your favorite Runs Generated equation (Runs Created, Base Runs, Linear Weights, Extrapolated Runs, whatever).

As a question, where is there a study showing that teams regress their records to what pythagoras predicts? This has been taken for granted for quite a bit, but I don’t remember any study, and I’ve read all the Ballantine Abstracts.


#4    Sky      (see all posts) 2010/02/09 (Tue) @ 11:01

So would you say that the M’s were unlucky in how their events combined into runs and then lucky in how their runs combined into wins?

Or is there some connection between the events and the wins somehow?


#5    jinaz      (see all posts) 2010/02/09 (Tue) @ 11:02

I had them as a 0.461 team based on component winning percentage last season:
http://www.beyondtheboxscore.com/2009/10/11/1079698/btb-power-rankings-end-of-2009
(Converting Runs to Wins table)

Adding in the league adjustment put them at 0.487.

2nd worst offense in the AL, 3rd worst pitching, but easily the best fielding.  If I would have used FIP instead of tRA, pitching would instead have come in 7th-worst in the AL. 
-j


#6    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 11:12

MGL had a good article in one THT annual and Phil made it in his SABR presentation on this issue as well.  I’m sure they can offer insights as well.

***

The basic point is this: there is no such thing as a “team”, but a collection of players.  Since we all evaluate players based on their components, why do we then completely throw that out, and evaluate teams based on their runs scored and allowed (i.e., R, RBI, and ER)?  That makes no sense.  None!  We’re saying that while individually, the Mariners offense should have scored say 7000 runs based on their OBP, SLG, SB, baserunning, etc (number for illustration purposes ONLY), we’re going to ignore that and instead treat them as scoring 6500 runs.  (I used ridiculous numbers because I know someone’s going to quote numbers that I just pulled out of my butt.) And we’ll do the same on the defense side.  It doesn’t make any sense.

Now, if you want to evaluate the players’ ability with men on base and bases empty, with the game close and whatnot, then fine, do that.  But, do NOT rely on the team-level data to tell you anything about that.  Do it at the player level, and then add it up.  But, 99.99% of people are lazy, and will rely on team-level data instead.

You evaluate players as players, and you evaluate teams as a collection of players.


#7    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 11:14

Let’s see.  The Mariners offense had a .258/.314/.402 line, and the Mariners defense had a .247/.316/.394 line.  Looks to me like the Mariners offense was a bit *better* than the Mariners opponents.  They were “unlucky” enough to score fewer runs than they allowed compared to their bases and outs, and “lucky” enough to win more than they lost, based on their runs.

Why would their RS/RA tell you something more than their OBP/SLG on off/def?


#8    jinaz      (see all posts) 2010/02/09 (Tue) @ 11:27

Tango/7,

I’m not sure if that was a response to me or not.  But just in case, to be clear, the number I cited is based on park adjusted wRC, BPro baserunning, a DIPSy pitching stat, and fielding (including catchers).  No actual runs scored or allowed are used; it’s all component-based estimates of RS and RA fed into Pythagenpat.  Ultimately, what you’re talking about is the entire motivation behind that power ranking project.

Offense (park adjusted, including EqBRR baserunning) I have SEA at wOBA = 0.319.  By tERA I have pitchers at 4.66 (though FIP = 4.38--I do some park adjustments on FIP so it may differ from fangraphs...next year I’ll probably use xFIP instead of this home brew tRA of mine because I trust it more).  Fielding, including catching, I have at +80 runs, which is way over the 2nd-ranked Reds at +59 runs.
-j


#9    jinaz      (see all posts) 2010/02/09 (Tue) @ 11:45

Whoops, FIP should have read 4.48.  And for some reason, looking back at my spreadsheet, it’s now reporting 4.51.  I think they’re different because I’m now using 2009 HR park factor numbers from Patriot to adjust HR rate, whereas before I was using 2008.

Anyway, just wanted to say that if I use the FIP number instead of tRA to figure estimated runs allowed, I get:

783 - 80 runs (fielding) = 703 eRA (I got 731 eRA using tRA & fielding).

That’s compared to an estimated runs scored of 665 based on wRC and EqBRR (removing steals), with park adjustments.  Still puts them a few wins below 0.500, though obviously it puts them closer.

I don’t know why the slash line you cited wouldn’t agree with what I’m doing.  Maybe I’m not giving credit for fielding, or there were some “luck” factors that feed into the defensive slash line that aren’t accounted for with a DIPS/Fielding approach...?  Or it’s internal weighting issues within OBP and SLG that aren’t borne out in wRC.
-j


#10          (see all posts) 2010/02/09 (Tue) @ 11:53

I don’t understand what the pythagorean record (terrible name, btw) is trying to measure.  Or more specifically, what the difference between the pythagorean record and the actual record is trying to measure.  Is the difference supposed to be due to luck?  Or is it due to something like “intangibles” and the skill of the coaching staff?  Or something else that is difficult to measure directly?

I think if we want to know how lucky a team has been over a season, it would be better to try to define “luck” and measure it directly.  And then you have to figure out the difference between luck and strategy.

My guess is that “lucky” teams over a season will have fewer injuries than the norm, and more positive outier seasons from players than the norm (ie a player who could be expected to hit 20 home runs hits 30, but doesn’t do that in any other season of his career).  With unbalanced schedules, maybe who teams play over the course of a season, and when (teams change during a season) will be a factor.  Simply comparing runs scored and runs allowed with records won’t capture a team winning or losing alot due to one-off, random factors because these factors themselves will be causing or preventing the runs.

Some teams emphasize putting lots of players on base, and there is alot of evidence that this is a good approach, but the flip side of that is that they will wind up with lots of players stranded on base.  The team will look bad if you compare players on base with runs scored.  Some teams have feast or famine offenses where they might score ten runs in one game, then zero the next.  A team with one dominant and four weak starting pitchers will look different than a team with five mediocre starting pitchers, and so on.

I’m just not sure what the pythagoean record is telling me.  Is it a good sign or a bad sign that a team is winning an unusually large number of close games?  I have a hazy impression that this is a good sign, but has any substantive work been done on this question?


#11    jinaz      (see all posts) 2010/02/09 (Tue) @ 12:09

Ed/10,

Pythagorean record basically tells you the winning percentage you’d expect to get based on your team’s actual runs scored and runs allowed.  It’s basically a model of how runs scored and allowed typically convert to wins.  Looking at it is a way of getting past some of the timing of when runs are scored that are what ultimately cause differences between actual and pythagorean w%.  Many of those timing events are random, though some (like when good relievers are leveraged) probably are not.

A component-based approach like Tango, Dave, etc are advocating takes another step back and looks at how well a team performed based on their component statistics.  Tango reported slash lines--how we evaluate hitters--at the team level to evaluate offense relative to defense.  This gets away from even more of the timing events, like clutch hitting, that affect how components are converted into runs.  But again, most of those timing events are fairly random.  The result is arguably a better view of how well a team performed than straight-up pythagorean records, much less winning percentage, provide.
-j


#12    Peter Jensen      (see all posts) 2010/02/09 (Tue) @ 12:28

jinaz - If you are trying to estimate how many games a team should have won, why wouold you park adjust the numbers, since they do have to play half their games in that park?


#13    jinaz      (see all posts) 2010/02/09 (Tue) @ 12:35

Peter/12,

Mostly because I also like to look at how teams stack up against one another in the component stats--offense vs. pitching vs. fielding.  I agree that if all I was interested in doing was estimating team winning percentage, it wouldn’t make much sense to do park adjustments. 

That said, you could argue that DIPSy stats are inherently park adjusted by themselves (they usually assume a neutral BABIP environment), so perhaps it is worth it to park adjust everything to (try to) make sure everything’s compared in a neutral park context.
-j


#14    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 12:51

783 - 80 runs (fielding) = 703 eRA (I got 731 eRA using tRA & fielding).

That’s compared to an estimated runs scored of 665 based on wRC and EqBRR (removing steals), with park adjustments.  Still puts them a few wins below 0.500, though obviously it puts them closer.

The slash lines are extremely close, even a plus to the Mariners, after you account for SB and baserunning.  Yet your component analysis is showing a differential of -38 to -66 runs.

You’re components runs allowed is showing 731 runs allowed, while the avg AL team allowed 771 runs.  You’re going to have to explain how the Mariner pitching+fielding was only 40 runs better, when their OBP/SLG line was, by far, the best in the AL. 

Indeed, like I said, they have similar OBP/SLG lines, and yet your component analysis is showing as much as 66 runs of difference.

Something’s wrong, no?


#15    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 12:57

My guess is you are not applying your park factors consistently, and you are over-regressing the fielding (when you actually shouldn’t regress it at all in this particular case).


#16          (see all posts) 2010/02/09 (Tue) @ 13:07

Dave may not want to link to his article from earlier this offseason, but he goes through this argument using WAR:

http://ussmariner.com/2009/10/05/war-and-the-2009-mariner/


#17    jinaz      (see all posts) 2010/02/09 (Tue) @ 13:33

I don’t think what I’m doing is necessarily wrong, given that, for example, I’m using DIPS-based pitching.  It could be that the slash line is misleading because of BABIP or something and what I’m doing is more indicative of actual performance.

It is the case that I’m not applying park factors consistently, because I don’t think you can use the same park factors on a DIPS stat as on an overall RC stat like wRC.  I’m applying park factors to components going into tRA (or FIP), whereas I’m just doing an overall runs adjustment to the wRC data.  It’s a fair critique, but I preferred to just use wRC since it was available rather than deal with my own linear weights. 

Fielding: no regression, but I do average together two fielding estimates.  UZR had Seattle at +85 runs, whereas THT’s batted ball team fielding stat had them at +61 runs.  Average is 73, plus 7 runs for plus catching.  So yeah, if you just use UZR + catching (no THT), you save 12 more runs.

As for pitchers, I think I’m pretty close to the mark:
Actual runs allowed for Seattle was 692.  If you park adjust it (again, using PF = 0.98), it moves to 706.  Then, add back in the fielding estimate above (+73 runs) and that gives an estimated park neutralish 779 RA by pitchers alone.

“My” FIP-Runs estimates 783 RA.  Pretty much dead on.  I know we’re trying to not be wed to runs allowed here, but on average we should do a good job of estimating them.  I’m not getting a massive disparity between RA and eRA here with respect to pitcher performance.

So, maybe I’m missing on fielding.  I think the pitcher estimate is pretty close.

My hunch is that another part of this is that the Mariners had two forms of luck cancel here.  They allowed a “lucky"-low number of hits allowed (which FIP doesn’t track but the slash line does).  But that the hits they allowed translated into an “unlucky” number of runs.  So, maybe the slash line misleads a little bit too.  Just a guess, but it helps explain our disparity here.
-j


#18    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 14:46

In order to really explain it, you have to show how the Mariners’ +79 actual runs allowed is broken down, so that you can get +40 component- park-adjusted runs out of it.

And then do the same for the hitting.

So, for example, you can say:
+30: DIPS
+15: BABIP
+5: sequencing
+29: UZR

And then say:
-19: Park

And so, you decide: BABIP counts as 0, sequencing as 0, and I’m left with +30, +29, -19 = +40

And then repeat something similar for hitting.

(All numbers for illustration purposes only.)


#19    Tim J      (see all posts) 2010/02/09 (Tue) @ 14:49

I would LOVE to see some kind of adjusted standing report kept up on Fangraphs based on component stats that updated through the season.  They already have the raw data in their team reports, it wouldn’t be that hard to add the WAR up and show it for the different teams. 

I like BPro’s adjusted standings report but at this point that and the Playoff odds reports are the only reasons I go to that site anymore.  It would be great if FG could incorporate those into their own site so everything I want to see is in one place.  Especially if they could make it accessible from the mobile app.


#20          (see all posts) 2010/02/10 (Wed) @ 15:42

I think of sabermetrics as having identified at least three major elements of “randomness” effects: (1) the degree to which balls in play fall in for hits, (2) the degree to which offensive bases are bunched to create runs, (3) the degree to which runs are bunched to create wins.

Each of these three has a standard expectation, random variation from which will create, respectively, more or fewer: (1) hits than expected given the number of balls in play generated, (2)runs than expected given the number and type of offensive bases generated, and (3) wins than expected given the number of runs generated or prevented.  FIP/DIPS analysis seeks to identify the randomnness elements at play in (1).  Traditional pythagorean expectation anlysis seeks to identify the randomness elements at play in (3). Comparing team Runs Created (or a similar run predictor) to team Runs can be used to identify the randomness elements in (2). 

If one wants to combine two or more of these three randomness-identifying techniques into a single analysis, one can do that, but each of three can also be legitimately used separately for the particular narrow purposes it was specifically designed to achieve.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 18:07
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 17:58
Clutch analogy

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 11:54
Who is Jeremy Lin?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul

Feb 10 18:32
Moneyball at Villanova

Feb 10 17:00
Psst… wanna intern in Canada?