THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, May 28, 2009

Simple Zone Rating

By Tangotiger, 08:38 AM

Colin gives us his fielding system.  One thing I’d like to see is to have this data (plus the OF and C ones) normalized so that the plays made adds up to the total number of outs made:

1B:
.85*A + .08*PO
2B:
.85*A
SS:
.85*A
3B:
.90*A+.06*PO

That is, perhaps some team should have .86 or .83, etc.


#1    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 09:18

Colin-

Using 2005-2008 data I get very different infielder assist put out to plays made conversion rates than you.

1B = .91A + .09PO
2B = .83A + .01PO
SS = .87A + .02PO
3B = .98A + .05PO

Perhaps I have made an error, but you may want to check your math just to make sure of your numbers. 

Groundball rates for hitters and pitchers were regressed to the mean before use. I used the weighted average method from Tom Tango, figuring R from the method involving random and observed variance.

I am not sure why you would do this, as regression is usually only used to project what a player will do in the future.  When dealing with past events the actual rates should be used.  And Odds Ratio is a better method than Log 5, which can have problems with rates that differ markedly from average.

I know you want to keep it simple, but I would prefer to see 3 of the hit allocation charts; 1 for singles, 1 for doubles, 1 for triples and inside park homers.  No reason to give partial hit values for triples and home runs to catchers, pitchers, 2Bs, and SSs as these almost never occur. And the allocation of doubles and singles would differ a lot as well.

Since out of the park HRs are not usually separated from inside the park HRs, I assume you must have made a decision somewhere in the transition from dead ball era to modern era to have all HRs either one or the other.  Where did you make the dividing line?


#2          (see all posts) 2009/05/28 (Thu) @ 09:55

Hal Chase as one of the worst fielders of all time?


#3    joe arthur      (see all posts) 2009/05/28 (Thu) @ 10:21

Peter -
log5 is the same as odds ratio; you can manipulate one algebraically to arrive at the other.


#4    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 10:27

Thanks Joe I didn’t realize that.  Then do they both have problems with ratios that are far from average?


#5    Rally      (see all posts) 2009/05/28 (Thu) @ 10:47

Hal Chase had defensive talent to be sure.  Maybe he was just throwing games?

Or perhaps he has low assist totals because he took the plays himself.

For the pre-1990’s retrosheet stuff, Colin is on the same track as TotalZone, and some of his results are really similar - Brooks, Ozzie, Belanger.  The big difference I see is he’s using hit values based on batter vs pitcher handedness, while TotalZone goes a step further and creates a separate table for every batter.

It will be interesting to see how close the results are, since his should be much easier to calculate.

And that is pretty much how I intend to handle the pre-retrosheet stuff.  Assign hits based on handedness of the pitcher (which is all you have to go on) and figure plays made from the primitive stats of putouts and assists.

It will be a challenge to deal with outfielders (since we have limited info on which position they played) and estimating innings.


#6    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 11:02

I can update log5 to be the odds ratio. I’m actually trying to figure out a better method still, because I’m pretty sure that either assumes that each side has an equal share in deciding the outcome and I don’t think that’s true (simply because the amount of regression used in each case is markedly different). I used the straight-line approximation because it was easier to code for the time being.

As for why to regress at all - I am essentially using GB rate (actually air ball rate, although since they’re rates it’s all the same in the wash) on outs to predict the rate on outs. For the years in which this is an important concern, we don’t have batted ball information on hits.

Just as a for instance - if we have a player who has 10 at bats, flies out seven times and gets three hits, there’s no reason for us to assume that all of his hits came on fly balls as well. The regression amount for hitters is a very small amount; typically it’s about 7-8 batted balls at the league average added to each hitter’s rate, IIRC. (I recalculate the regression amount and the league average rate per season because the amount of data and the source can change significantly from season to season.)

I haven’t yet done anything about inside the park HRs. If I can scrape all the data out of the HR log at B-Ref, then I won’t need to worry about drawing an artificial line like that, and can figure that out precisely at the team level.

I like the idea of figuring out seperate hit allocation charts based upon what kind of hit it was. I’ll definately look into that for the Retroera; it might be harder to figure for the pre-PBP years.

I’ll work on the normalization for the pre-PBP years as well, Tom. There’s a lot of great ideas here - keep them coming!

(Also, I’m planning on publishing the full SQL source so that eventually others can actually take parts that they think can be improved and make the changes themselves if they like.)


#7    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 11:10

Rally, if you want to check the non-PBP data against TZ, here it is:

http://cid-502c55b8429a795d.skydrive.live.com/self.aspx/Public/szr|_no|_pbp|_infield.xls

That’s a BIG file, about 10 MB. I should have made it a CSV, but I wasn’t thinking about it at the time.

And here’s the full set for the Retroera:

http://cid-502c55b8429a795d.skydrive.live.com/self.aspx/Public/szr|_infielders|_v0.001.csv

That file is out of date by a few days - it doesn’t include the GB/FB adjustment.

Also, Peter, I used Retrosheet data for all years to figure the number of plays to award per PO and A, and then smoothed the figures out a little to get the even .05 step increments. I don’t think it pays to get too cute with the PO/A to plays conversion, and any of the problems there should be solved when I go ahead and “normalize” per Tango’s suggestion.

I’ll look at Chase and see what I can’t figure out there. Sometimes for the pre-PBP years I think we may have to just bite the bullet and accept some bad results.


#8    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 11:13

Rally - Retrosheet has games played for outfielders for each outfield position going back to the 1870s.  In a few games an outfielder played multiple positions so the innings would have to be divied up.  But it should be good enough information for the kind of rough estimates that any metric like yours or Colin’s is going to generate.


#9    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 11:43

Colin - It’s no harder to use the right numbers.  Your 3B and 1B numbers aren’t even correct to .05.  How can you normalize if you don’t start with the correct formulas to begin with?  Normalizing is going to be difficult anyway since you won’t know outs on bases for most years which will vary a lot from team to team with caught stealing.  And normalizing will get you farther away from a “simple” metric than other improvements.


#10    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 12:38

Here are the numbers I came up with, Peter, looking at all the Retroera:

FLD_CD &#xPO; %A
1 0.06 0.92
2 0.00 0.21
3 0.08 0.86
4 0.02 0.83
5 0.07 0.96
6 0.05 0.87

You’re right that it’s no harder to use the “correct” figures, but if we’re seeing as much variation in these figures as your reserach suggests, we have no reason to think they’re any more accurate, and I thought as a presentational issue simplifying it might be useful. (Looking at it again, I should be crediting shortstops for their POs, however.)

Also, these figures do not match up with catchers because I adjusted those figures (roughly) based upon CS totals. Those figures will need to be adjusted for earlier years.


#11    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 13:05

The POs seem high to me.  I queried POs on GBs fielded by the fielder on a play where he does not also have an assist, and then divided by all his POs (including POs on line drives and pop ups).  It didn’t appear that you were including pop ups and line drives in your assessment of infielder’s ability but perhaps you were.  This seemed to best correspond to the limited information that you will have available for the non Retro PBP era.  Anything else double counts plays made where there is both an assist and PO on the same play.


#12    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 13:31

Here is how I defined plays made:

SELECT FLD_CD
, SUM(IF(BATTEDBALL_CD NOT IN ("F”,"L”,"P") AND (PO1_FLD_CD = FLD_CD OR ASS1_FLD_CD = FLD_CD),1,0)) AS PM
, SUM(IF(BATTEDBALL_CD NOT IN ("F”,"L”,"P") AND (PO1_FLD_CD = FLD_CD),1,0)) AS PM_PO
, SUM(IF(BATTEDBALL_CD NOT IN ("F”,"L”,"P") AND (ASS1_FLD_CD = FLD_CD),1,0)) AS PM_ASS
FROM retrosheet.events e
WHERE BAT_EVENT_FL = “T”
AND BAT_DEST_ID = 0
AND EVENT_CD != “3”
GROUP BY FLD_CD;

Looking over it right now, this excludes fielders choices from plays made, which wasn’t intentional. Flies, liners and popups are all excluded from plays made. (Assist and PO totals were taken from the Baseball Databank over that time period.


#13    Dackle      (see all posts) 2009/05/28 (Thu) @ 13:59

I’m sorry if I read the article too quickly and misinterpreted something, but ... for plays made by first basemen I think you need to include team assists somewhere. Regressing putouts and assists by first basemen onto baseball-ref’s definition of balls fielded (in the second table on this page: http://www.baseball-reference.com/leagues/MLB/2008-specialpos_1b-fielding.shtml), I get: balls fielded = (.145 * putouts) + (1.167 * assists).

If you sort all 30 teams by the difference between predicted and actual plays made, and put them into three groups, you get the following averages for each group ("Top" are the 10 teams who made more plays than predicted by their POs and As, “Mid” is the middle 10, “Bot” made fewer actual plays than predicted, “Cor” is the correlation between the statistic and the difference between actual and predicted plays):

       First base    Team overall        Plays made   
Group  PO     A      A   GB/FB  GO/AO  Pred  Act  Diff
Top    1320  108   1546   0.76   1.01   317  344   +27
Mid    1396  103   1640   0.79   1.06   322  324    +1
Bot    1438  118   1729   0.83   1.16   346  321   -25

Cor    -.59 -.19   -.74   -.52   -.53  -.43  .37  1.00

So ... you can see pretty clearly that a groundball staff causes a higher number of assists and inflates the 1B assist total.


#14    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 14:05

Yes, that coding counts either too many assists or too many POs because it counts both an assist and a PO on a single play made where the fielder has both.  GB to shortstop’s left with man on first, touches second throws to first.  1 PO1, 1 ASS1, but only 1 play made.  I counted those as an assists plays made, but not a PO play made.


#15    Tangotiger      (see all posts) 2009/05/28 (Thu) @ 14:50

Dackle/13 was marked for moderation and is now open.


#16    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 14:59

Right. I would count those as PO_PM, not A_PM, because I think most of the time the PO would come before the assist. (I don’t know that it makes a big difference either way so long as it’s consistent.) This shouldn’t affect the Retroera fielding at all. I’ll try and get that changed tonight, as well as some other, easier fixes. (I’ll be out of town for the weekend, which will slow down some of the work here.)


#17    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 15:15

Like Colin, I have also been working on a non-PBP fielding metric the last couple weeks. But his is much simpler and elegant. I have been using Retrosheet data from 1989-1999 since there is hit location data available for those years. Except for second baseman, I get different values for plays made using infielder assists:

2B: .852*A
3B: .986*A
SS: .884*A

I’m still working on a regression formula for unassisted groundball putouts for first baseman. I don’t think Colin should be including any putouts by third baseman.

For outfielders, it is a lot simpler to figure plays made. Just use putouts (all of them!).

Now I do have a few questions here. What are you using to figure the GB/FB rate. Are you using a percentage of IF assists and all outfielder putouts to figure this? I don’t see how you can do it any other way. What do you guys recommend using for a run-value for non-ROE errors for each of the positions (including pitchers and catchers)? This is where I am currently stuck at right now.


#18    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 15:38

Colin, I have defensive innings estimates and LF/CF/RF breakdowns for all players prior to the Retrosheet era. I spidered the DFT’s from BP’s website and linked them to the BDB database. If you want, I can email you my fielding table.


#19    Peter Jensen      (see all posts) 2009/05/28 (Thu) @ 15:53

Colin - If I were you I would count those plays as assists.  Since you are trying to estimate GB plays made in the non PBP era you don’t want to use POs for infielders any more than you have to because the raw statistics will include a lot of air out POs and non hit ball POs.  Most of the assists will come on GB hit balls and therefore be more stable for estimating purposes.  Whether the PO comes before the assist doesn’t have any meaning for your estimation purpose.


#20    Rally      (see all posts) 2009/05/28 (Thu) @ 15:53

terpsfan101, I would be very interested in defensive innings estimates.  I think you have my email, but just in case,

rallymonkey (numeral for five) at comcast dot net


#21    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 16:27

Rally,

I am having a hard time exporting my fielding table from Access since it is more than 65,000 rows. I would like to keep everything in one table, but I might have to split it into 3 tables.


#22    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 16:27

Okay, Hal Chase is a real problem. His PO totals are astronomical compared to his A. So first base plays should probably look like this:

PO-Inn*OIF_A_RT

Where OIF_A_RT is the rate of other infielder assists per inning.

I’ll mull over the PO/A a while longer, Peter. I take your point about the assist being a better measure for IF talent on the whole.

terpfan, go ahead and send me that. E-mail is:

pontifexexmachina at hotmail.com

I have my own G to Inn conversion right now, but it could use some work. (I don’t want to rely on any part of the DFTs as an input, because I want a standalone system, but it couldn’t hurt to test against the DFTs to at least see where they’re similar and where they’re different.)


#23    Gary Geiger Counter      (see all posts) 2009/05/28 (Thu) @ 16:41

IIRC, there are extenuating circumstances wrt to Eddie Yost’s defensive numbers.  Didn’t the Senaotrs pitching staff have a weird composition during his heyday?


#24    Tangotiger      (see all posts) 2009/05/28 (Thu) @ 16:56

terps: make sure you export to a csv file, not Excel.


#25    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 18:22

I’m not sure how you export to a CSV. I tried to export it as a text document, but got a warning that I could only export 65,000 records to the clipboard. I just decided to split the table into 3 tables.


#26    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 19:06

If you want to add even more precision to infielder plays made, you can subtract out estimated double plays turned and a percentage of OF assists (the infield records an assist on approximately 3/8 of OF assists).

So here is what you could use for plays made:

2B: Ast - (.548*2B_DP + .103*OF_AST)
3B: Ast - .07*OF_AST
SS: Ast - (.433*SS_DP + .127*OF_AST)

This gives you an estimate very close to actual plays made:

POS: Actual Plays, Estimated Plays, %
2B: 120262, 123177, .976
3B:  91118, 91746, .993
SS: 125601, 128064, .981


#27    Rally      (see all posts) 2009/05/28 (Thu) @ 20:18

What do you think of Bill James’ method of estimating 1B unassisted putouts from Win Shares?

It seems to me like a reasonable enough estimate, depending on how complex you want to make your measure.


#28    Rally      (see all posts) 2009/05/28 (Thu) @ 20:23

I’m using Total putouts - assists (p,2b,3b,ss)*.84

I think it’s from Win shares, or slightly modified.


#29    Colin Wyers      (see all posts) 2009/05/28 (Thu) @ 20:33

Those ideas could work, terps and Rally. The only issue is you need to prorate out to playing time. (Or do claim points, but that introduces a lot of other issues and then I have to come up with another name, because it wouldn’t be simple anymore.) I’ll take a look at it when I get home.


#30    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 20:34

There are 3 formulas that I am aware of that estimate unassisted putouts for firstbaseman. Of course, they all estimate total unassisted putouts. I am only interested in estmating unassisted putouts on groundballs.

Palmer uses: PO_1B - .84 * (A_2B + A_3B + A_SS + A_P)

Charles Saeger uses: PO_1B - (A_2B + A_3B + A_SS)

James uses the most complex formula: (PO_1B - .7*A_P - .86*A_2B - .78*A_3B - .78*A_SS + .115*(Runners on 1st + SH) - .0575*BIP) * 2/3 + (BIP*.1 - A_1B) * 1/3

I will go ahead and test all three of them using the Retrosheet data from 1989-1999.


#31    Rally      (see all posts) 2009/05/28 (Thu) @ 22:07

"then I have to come up with another name, because it wouldn’t be simple anymore.”

That’s the way the game works.  You come up with something simple, and if you put it out for feedback, you get a ton of suggestions, many that make sense.  Try and implement them and what you are doing is anything but simple.

Might as well keep the name though.  Dan Fox’s fielding runs is anything but simple.

I’ve had a name for my stuff for awhile.  JAARF.  Just another adjusted range factor.  Still working on the details though.


#32    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 22:10

Palmer’s formula underestimates total unassisted putouts around 20%. Anyway, I came up with two formulas for firstbaseman unassisted GB putouts.
First, the simple one using only assists:

=0.35*PO_1B - 0.25*A_P - 0.18*A_2B - 0.32*A_3B - 0.27*A_SS

Here is the more complicated one based on plays made:

First you need plays made:

PM_P: A_P*.926
PM_2B: .976*(A_2B - (.548*DP_2B + .107*A_OF))
PM_3B: .993*(A_3B - .07*A_OF)
PM_SS: .981*(A_SS - (.433*DP_SS + .127*A_OF))

Here is the regression equation:

.38*PO_1B - .253*(PM_P + PM_2B) - .354*(PM_3B + PM_SS)


#33    terpsfan101      (see all posts) 2009/05/28 (Thu) @ 22:19

Mistake above:

PM_2B: substitute .103*A_OF for .107*A_OF

It doesn’t change the regression equation at all.


#34    terpsfan101      (see all posts) 2009/05/29 (Fri) @ 17:20

Colin, I find it strange that you say you want to keep things simple, yet you have 4 different adjustments for batter/pitcher handedness. I got a little carried away in my last few posts. Here are some simpler suggestions.

For plays made by 2B, SS, and 3B just use assists. I would use all assists, with one minor adjustment for 2B and SS. You probably want to subtract the number of DP’s that you think they turned but didn’t start. You could use a fixed number like I suggested, or better yet, devise a dynamic estimate.

For firstbaseman, you are going to have to estimate unassisted putouts. Again, we are only interested in knowing the number of groundball plays that firstbaseman made. So your plays made formula for firstbaseman should be a percentage of unassisted putouts + assists.

My next suggestion is to include a DP rating for 2B, 3B, and SS and an arm rating for OF’s. Palmer’s method for estimating DP opporunities is pretty solid:

The formula for calculating double play opportunities is:

.662 * (H – HR + BB + HB + .575*E)

Individual double plays were divided by the team double play opportunities divided by the league average double play opportunities.

You also want to split the run-value of a GIDP 50/50 for SS/2B and 3B/2B DP’s. Using Baseruns, I have run values for GIDP’s dating back to 1871.

The run-value of an outfielder assist is approximately the negative run-value of a double. So if a double is worth .8 runs, an outfielder assist is worth approximately -.8 runs.

If your plan is to publish an SQL to be used with the BDB database, you should also simplify your handedness adjustment so it only accounts for pitcher handedness. Keep in mind, that platooning wasn’t that common early in the 20th century. Palmer multiplies his LHP/RHP adjustment by a year factor to account for this. I would just use his method for accounting for platooning:

“YFP is the year factor, necessary because this factor steadily increased in importance from 1910 to 1970. Before 1910 the YF is 0, so no adjustment is necessary. The adjustment can be calculated for each year from 1910 through 1970 by subtracting 1910 from the year in question and then dividing by 60. After 1970 the YF is always 1.”

Also, I wouldn’t apply any regression to your GB/FB adjustment.

If you want help with anything else, don’t hesititate to ask!


#35    Rally      (see all posts) 2009/05/29 (Fri) @ 21:08

Terps, I like your formula in #32 (simple one).  It produces a much tighter range, the team with the fewest has 54.  I was using the Palmer formula before, and got a negative result for one team, zero for another, and a few others impossibly low.  I think I’ll switch.


#36    terpsfan101      (see all posts) 2009/05/29 (Fri) @ 23:55

I actually thought the range was too tight. I looked at including 1B DP’s in the equation to account for duplicate IF assists, but it didn’t improve the accuracy at all. Interesingly when I included them, the weight for a pitcher and 2B assist was about the same and the weight for a 3B and SS was about the same.


#37    terpsfan101      (see all posts) 2009/05/31 (Sun) @ 03:25

Regression equation for DP’s turned and not started:

DPT_2B = .535*DP_2B + .236*DP_3B - .051*DP_SS
DPT_SS = .060*DP_2B + .050*DP_3B + .482*DP_SS


#38    Dackle      (see all posts) 2009/05/31 (Sun) @ 12:01

Terps, for your DP rating should you not also include the groundball tendency of the pitching staff? A flyball hit with a runner on first and less than two out isn’t really a DP “opportunity”. Might be better for the basic structure of the formula to be:

Estimated runners on first with less than two outs x groundballs put into play

Also, in your formula: .662 * (H – HR + BB + HB + .575*E), I wonder if the accuracy would be improved if “H-HR” were replaced by singles. Doesn’t seem like doubles and triples generate many DP opps.


#39    terpsfan101      (see all posts) 2009/05/31 (Sun) @ 16:29

Yes, Dackle it is probably a good idea to adjust GIDP opportunities by GB/FB ratio. Also, if you had 2B and 3B allowed, it would probably be better to use .662*(1B+BB+HBP+.575*E). If you have reached on errors allowed, then you would obviously substitute them for .575*E.


#40    Dackle      (see all posts) 2009/05/31 (Sun) @ 17:28

Oh I see now—H-HR for those years when doubles and triples allowed aren’t available.

Wonder if it’s better to use x(H-HR) + BB + HP etc, where the x(H-HR) term estimates the number of singles allowed (ie x = league singles/(league H - league HR).

If you have the data at hand, what do you think of the approach of estimating defensive innings using plate appearances? Is it better to use % of total chances (as James does in Win Shares)?


#41    terpsfan101      (see all posts) 2009/05/31 (Sun) @ 18:33

No need to esimtate the percentage of hits that are singles for each season, you can get the exact number from the league batting statistics.

For estimating defensive innings you want to use a combination of PA’s and defensive chances. There really isn’t a decent published method out for estimating defensive innings. The method used in the STATS All-Time Handbook works fairly well for infielders, but it’s accuracy for outfielders is horrible. I don’t know what method Clay Davenport uses for estimating defensive innings (he uses adjusted games), but it is very accurate. That is why I downloaded the Davenport translation pages. I converted his adjusted games into defensive innings using the average number of team IP per game for each season. If you want, I can email you my fielding table.


#42    Dackle      (see all posts) 2009/05/31 (Sun) @ 20:42

I guess I’m just hung up on the .662 coefficient—I don’t see how a walk is .662 of a DP opportunity but so is a double or triple (H-HR). Should the formula be:

.662(H – HR) + BB + HB + .575(E)


#43    terpsfan101      (see all posts) 2009/05/31 (Sun) @ 22:20

You can only have a DP with 0 or 1 out. So you use apporoximately 2/3 the number of estimated runners on 1st base in the DP opportunities formula.


#44    Dackle      (see all posts) 2009/05/31 (Sun) @ 23:19

Ah, that makes sense. I always figured the DP conversion rate would account for the two-out situations (ie measuring players against a baseline of either .108 DPs for every DP opp, or .086 for every runner on 1st—should be roughly equivalent). Still, probably best to measure the precise # of opps as you’ve done.

I’m still not sure why you aren’t estimating singles. Your formula works out to: .662 * (1B + 2B + 3B + BB + HB + .575*E). I thought maybe you’d do something like .662*(((H-HR)*.747)+BB+HB+(.575*E)), although the .662 would have to increase a bit. If the #s are available it might be worth subtracting out SH, WP, BK, PB, as they also remove potential runners on 1st. SH+WP+BK+PB totalled 3,558 last year vs 1,672 for HP and 1,712 for ROE.


#45    terpsfan101      (see all posts) 2009/05/31 (Sun) @ 23:26

Dackle, it isn’t my formula. It’s Pete Palmer’s formula. I’m working on a better one.


#46    Chris Dial      (see all posts) 2009/06/01 (Mon) @ 11:00

What is the plan to compensate for being off of chances based on BIP data?  How do we know who is a bad fielder and who simply didn’t have balls hit over tehre?


#47    Colin Wyers      (see all posts) 2009/06/01 (Mon) @ 12:04

I’m still regrouping after my little weekend trip - there’s a lot of great discussion here about figuring out plays made, and I hope to dive into that soon enough.

Chris, there’s only so much we can do to estimate chances without PBP data. (And only so much we can do to estimate chances with PBP data that’s missing hit location, for that matter.) The best improvement is to get more data; there’s been some discussion on the RetroSQL group recently about getting the box score event files in database form, which I think would greatly improve some of the estimates involved.

Beyond that, I think once you’ve done all you can you slap a set of error bars on everything and live with it. I’m certainly not done yet, but I recognize there’s only so much I can do here.


#48    Rally      (see all posts) 2009/06/01 (Mon) @ 22:35

I’m looking for some help if you are a student of baseball history and can provide some knowledge of outfielders who played 1900-1955.

http://lanaheimangelfan.blogspot.com/2009/06/request-for-help.html


#49    terpsfan101      (see all posts) 2009/06/08 (Mon) @ 16:17

For firstbaseman plays made, I think it is better to estimate total plays made rather than estimate unassisted groundball putouts and adding them to first baseman assists. This new formula works much better for plays made:

PM_1B: -0.099*PO_P + 0.35*PO_1B -0.281*A_P + 0.941*A_1B - 0.137*A_2B - 0.349*A_3B - 0.288*A_SS


#50    terpsfan101      (see all posts) 2009/06/11 (Thu) @ 20:51

Here are the modified fielding runs equations I am going to use. They are baselined against 1989-1999 run enviornment (4.63 RPG). Do they look reasonable?

2B/SS: .75*(A - DPT) - .4*E + .2*DP
3B: .8*A - .4*E + .2*DP
1B: .8*PM - .4*E
OF: .85*PO + .75*A - .4*E
P: .7*A - .4*E
C: .55*A - .3*(E+PB)


#51    terpsfan101      (see all posts) 2009/06/11 (Thu) @ 21:26

Obviously the catcher’s formula needs more work, like including CS for modern seasons, and maybe crediting catchers for a small percentage of team pitching runs.


#52    Rally      (see all posts) 2009/06/11 (Thu) @ 22:47

Not sure what you are calculating there.  Derek Jeter last year had 347 assists, 12 errors, 69 DP.  That formula would give him a value of 240 or something (with a guess as to how many of the 69 DP he turned).

So my guess you are applying that to a player’s fielding stats above what an average player would have, in which case it looks good, though perhaps errors should be a bit higher.

And the real question is, how do you figure what an average player would do?


#53    terpsfan101      (see all posts) 2009/06/12 (Fri) @ 01:22

Yes Rally I am comparing fielding stats to an average player. So if a SS is 10 Assists above average, that is 7.5 runs above average. Right after I posted the equations, I realized I made a mistake in calculating the run value of an error. I used the actual run-value for RBOE, instead of comparing it to the run-value of an out. I don’t think I have to do this for non-RBOE errors. Also, I am still working on the adjustments (handedness, GB/FB ratio) that will tell us how an average fielder would do.


#54    terpsfan101      (see all posts) 2009/06/12 (Fri) @ 05:13

Fixed the low run-value for errors:

2B/SS: .75*(A - DPT) - .65*E + .2*DP
3B: .8*A - .65*E + .2*DP
1B: .8*PM - .65*E
OF: .85*PO + .75*A - .45*E
P: .7*A - .55*E
C: .55*A - .35*E - .27*PB



#56    terpsfan101      (see all posts) 2009/06/13 (Sat) @ 22:37

Give me a chance to examine the massive SQL before I comment. It looks like you did the best job possible within the limitations of the Retrosheet data.


#57    Colin Wyers      (see all posts) 2009/06/13 (Sat) @ 23:42

I don’t think that’s true, terpsfan, at least not in the strictest sense. My to-do list, off the top of my head, includes:

* Better park factors, including park factors for infielders
* Seperate accounting for fielding of bunts
* Adjustments for:
- Holding a runner
- Potential bunt situation (runner on, less than 2 strikes)
- etc.

A great overview of some of the more detailed adjustments that can be made to a fielding metric are here:

http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2003-03-21_0/

And obviously post-1989 there’s more data to work with, and eventually that should be incorperated. And hopefully by sharing the code, someone who wants to work on one of those - or who has an idea for an improvement I haven’t mentioned - can go right in and do it themselves. That’s the exciting part about publishing the code from my POV.


#58    terpsfan101      (see all posts) 2009/06/13 (Sat) @ 23:44

I just finished the LHP/RHP adjustment. I am borrowing the structure of Palmer’s formula, but not his values:

(1 + ADJ * YF * DLHP).

ADJ:

1B -.3
2B -.165
3B 0.219
SS 0.105
LF -.103
RF 0.129

“DLHP is the difference in the percentage of left-handed pitching from the league average.”

“YFP is the year factor, necessary because this factor steadily increased in importance from 1910 to 1970. Before 1910 the YF is 0, so no adjustment is necessary. The adjustment can be calculated for each year from 1910 through 1970 by subtracting 1910 from the year in question and then dividing by 60. After 1970 the YF is always 1.”

For catchers I am thinking of including non-K putouts. I might exclude catcher assists altogether. Prior to the retrosheet era (no SB/CS data), catcher’s who had good arms are going to have fewer assists because they had fewer basestealing attempts against them.


#59    terpsfan101      (see all posts) 2009/06/14 (Sun) @ 11:47

Colin, the only suggestion I would make is to use multi-year park adjustments. This might require a seperate table that you could provide or you might be able to fit it into your query.


#60    terpsfan101      (see all posts) 2009/06/14 (Sun) @ 12:20

I made a mistake in calculating the LHP/RHP adjustment figures. Here are my revised values:

1B: -.434
2B: -.236
3B: 0.308
SS: 0.148
LF: -.144
RF: 0.180

Here are the values Palmer uses:

POS PO, A
1B: 0.00, -.40
2B: 0.23, -.27
3B: -.22, 0.34
SS: -.10, 0.14
LF: -.16, 0.00
RF: 0.09, 0.00

For the most part our values are similar except for RF.


#61    terpsfan101      (see all posts) 2009/06/18 (Thu) @ 03:21

DP Opps = .73*1B + .66*BB + .37*E + .71*HBP

Trust my math on the .37 figure for errors. 60% of all errors result in the batter reaching base safely (not counting reached on SH). 5/6 RBOE’s result in the batter ending up at first base (.6 * 5/6 = .5) And 5/7 of RBOE’s occur with less than 2 out (.5 * 5/7 = .357). I had to apply a 1.05 multiplier to 1B, BB, HBP, and RBOE since I am ommitting small stuff like SH, SB, CS, WP, PB, BK. So .357 * 1.05 = .375, in this case rounded down to .37.


#62    Dackle      (see all posts) 2009/06/18 (Thu) @ 05:02

What about the groundball tendency of the pitching staff? A flyout with a runner on first isn’t really a DP opportunity. It might be hard to determine groundballs retroactively, as a high ratio of assists to (PO-A-K) would equal a high groundball ratio and hence lots of DP opps, even if the ratio of balls hit on the ground vs in the air was the same.


#63    terpsfan101      (see all posts) 2009/06/18 (Thu) @ 13:56

Yes Dackle, GIDP opps should be adjusted by the team GB rate, and the number of BIP. I’m still working on a formula to estimate GB outs. For FB outs I’m using OF putouts.


#64    terpsfan101      (see all posts) 2009/06/18 (Thu) @ 23:56

I’m going to use Charlie Saeger’s formula to estimate GB outs:

Groundouts = Team Assists - Catcher Assists - Outfield Assists - 1st Baseman DP’s.

For flyouts I’ll just use OF putouts.

Now all the pieces are in place (I think!) for my fielding runs metric. I adjusted the catchers formula that I posted earlier. I already know beforehand that I’m going to have to apply some regression. I have yet to decide whether that involves adjusting my equations or the results of those equations.


#65    Dackle      (see all posts) 2009/06/19 (Fri) @ 12:37

One thing has always bugged me about using fielding stats to measure ground/fly ratios. Assume that an average fielder turns two-thirds of balls in play into outs. If a pitcher gives up three groundballs and three flyballs and the infield and outfield defense are both average, then the result should be two groundouts, two flyouts and two hits allowed, and a groundout/flyout ratio of 1.00. But ... if the infield defense is perfect and converts every play, then the result is three groundouts, two flyouts and one hit, for a groundout/flyout ratio of 1.50, even though the distribution of groundballs to flyballs is the same—a ratio of 1.00.

If a fielding system then assigned the hits allowed to the different fielding positions, then in the first example, the two hits allowed would be equally split between offense and defense—1.00 each. But in the second example, because we assume the ground/fly ratio is 1.50, we assign .6 hits to the infield and .4 to the outfield (to maintain the ground/fly ratio of 1.50). But we know that the hit should be assigned entirely to the outfield, because the infield converted all three of its chances.

So ... I wonder if you can work hits allowed into the ground/fly ratio estimate, and maybe assume singles represent groundall hits and extra-base hits represent fly-ball hits.


#66    terpsfan101      (see all posts) 2009/06/19 (Fri) @ 13:02

Dackle,

I agree with you 100% that you should evaluate defensive statistics in the context of team hits allowed. Colin does this with his simple zone rating and Charlie Saeger does this with CAD. Obviously, IF’ers are responsible for most singles allowed and OF’ers are responsible for most extra-base hits allowed. But prior to 1974, we lack complete data on singles, doubles, and triples allowed. Even if I had complete data on hits allowed, I still don’t know exactly how I would incorporate it.


#67    Rally      (see all posts) 2009/06/19 (Fri) @ 13:09

It’s a good point, but over a full season you won’t have such a stark misrepresentation of hit distribution as that example.  A good infield will convert 80% of ground balls and a bad one converts 77% or something like that.

It turns out that something as simple as team assists/putouts does a reasonable job of estimating groundball percentage.  You can always test this with recent retrosheet data.


#68    Dackle      (see all posts) 2009/06/19 (Fri) @ 18:38

You’re right—I just had a look at the rankings by GO/AO on baseball-ref and they correlate very highly with GB/FB, although there is the odd exception.

In Win Shares, James argues that Mazeroski and Richie Ashburn weren’t as good as their fielding stats because of the groundball/flyball tendencies of their pitching staffs. I always wondered though—isn’t Mazeroski himself the reason why the Pirates had such a high assist total? Or didn’t the Phillies appear to be a flyball staff because Ashburn was running down so many balls?

I do find James’ argument compelling though, even though he doesn’t discuss the bias of comparing Ashburn to himself (ie Ashburn had half of the team’s outfield putouts and thus a huge influence on its ratio of putouts to assists). To summarize:

James notes that Granny Hamner played second base on the same teams as Ashburn, and his “defensive statistics are as bad as Ashburn’s are good”. He also quotes Sport Magazine in 1957 describing Hamner as being “especially good on the double play”. And yet Baseball-ref says the Phils that year turned 84 groundball DPs in 1,075 opportunities—a league-worst 8% conversion rate. Meanwhile, Ashburn led all major league outfielders in putouts (502), assists (18) and double plays (7), but the Gold Gloves went to Mays, Minoso and Kaline.

So ... it’s interesting to look at the hit distribution given up by those Phillies, which supports the view that it’s OK to use Ashburn’s putouts as part of the calculation of ground/fly ratio. James notes that the three lowest team assist totals from 1919 to 1960 were recorded by the 1955, 1956 and 1957 Phils. A summary of their hits allowed:

1955: 864 singles (lg avg 995), 427 extra-base hits (lg avg 356)

1956: 921 singles (lg 970), 486 extra-base hits (lg 370)

1957: 938 singles (lg 996), 425 extra-base hits (lg 399)


#69    Rally      (see all posts) 2009/06/19 (Fri) @ 20:47

For 1957 Hamner is rated at -2 runs on the DP.  Thanks to retrosheet, our information is pretty good.  We know almost exactly, within the limits of data mistakes, how many groundballs were hit and fielded by an infielder with a runner on first and less than 2 out.  His team turned a slightly below average percentage of those opportunites.

Not all-time bad by any means, but a bit below average.  I suppose the quality of those chances (how were the infielders playing? Were the runners moving on the pitch?) could distort the data and we’ll never know for sure.

There’s a lot more uncertainty in Hamner’s -11 range rating for that year, and Ashburn’s +17, as I’m using a very crude estimate for how many hits each was responsible for.


#70    Dackle      (see all posts) 2009/06/20 (Sat) @ 05:03

That’s interesting. James has the 1957 Phils turning 117 DPs vs 120 expected (but vs a league average of 150). So his ground/fly adjustment appears to work pretty well without PBP data.

How does the rest of the 1957 Phillies infield rate under your system? Baseball-ref has them at -1 runs (1B), -15 (2B), -1 (3B) and -9 (SS). And yet the pitching staff gave up 39 fewer singles than the league average (league = 7968 singles/36452 balls in play = .219 * Phils 4,471 BIP = 977 expected singles vs 938 actual). It’s difficult to see how giving up 39 fewer singles equals an infield that is 26 runs below average. But, maybe the PBP data say something else.


#71    Dackle      (see all posts) 2009/06/20 (Sat) @ 05:21

Oh I see. 977 expected singles, but a GO/AO ratio of .76 vs a league average of .99, so that would drive the expected singles well below the 938 actually allowed.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 08 16:18
Batman, the webslinger?

Feb 08 15:23
When is a life entity considered a person?

Feb 08 15:14
New PECOTA

Feb 08 14:44
When to purposefully lose the lead

Feb 08 13:49
The will of the people?

Feb 08 11:43
Is Nate Silver alot more certain than he lets on?

Feb 08 09:02
Forecaster’s Challenge: 2012?

Feb 08 07:43
For Your Soul

Feb 08 01:22
Why I’d Bet on My Model (and Against My Instincts)

Feb 07 20:05
Golfers “playing through”