THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, November 07, 2008

Pinto’s PMR

By Tangotiger, 03:26 PM

Follow along as Pinto releases his PMR data, day-by-day, through his archive page.

I prefer looking at “Actual minus Predicted”. I think Pinto took our suggestion to show that in the past, but, he doesn’t show it at the moment.  Among others: Utley is +28, Polanco +18, Phillips +13, Cabrera +11, Ellis +8, Scutaro +7, ODawg +6, Weeks +2, Pedroia +2… Iwamura -23.

Scutaro is also +16 as SS, and mate Eckstein is -15.  So, that confirms MGL and Dewan.  I must believe that there’s something going on here.  Is Eckstein’s suckitude making Scutaro look better than he is with the park adjustment?  Are the Jays pitchers giving easier to field balls?  Is he always behind Halladay?  Something.


#1    Rally      (see all posts) 2008/11/07 (Fri) @ 20:00

I think he either got a fluky large share of easy ground balls (that the limits of the systems grouped in a way that isn’t adjusted for), or else he’s a good fielder who just doesn’t look like one.


#2    MGL      (see all posts) 2008/11/07 (Fri) @ 22:41

Tango, I don’t follow you about Scutaro and Eckstein…


#3    MGL      (see all posts) 2008/11/07 (Fri) @ 22:47

I hate to pick out individual players, but given that UZR has Uggla as -4 in 08, -15 in 07, and -5 in 06, and I think he has a pretty poor reputation (I don’t know about the Fan Rating), I doubt he is a good fielder and I have to doubt his 08 PMR rating.

What database for the PBP data is David using?

I am not that familiar with his methodology, or at least I forgot what it was, but I vaguely recall that it was essentially the same as UZR and Dewan, no?


#4    MGL      (see all posts) 2008/11/07 (Fri) @ 22:50

Another one with a good (great even I think) rep and with very good UZR’s for the last 4 years, yet he is at the bottom of David’s list, is Aaron Hill.

Tulowitski same thing.


#5    tangotiger      (see all posts) 2008/11/07 (Fri) @ 23:24

http://tangotiger.net/scouting/scoutResults2008.html

Uggla is pretty low among 2B.

***

David uses BIS

***

I was thinking that the park adjustment might be influenced by Eckstein, so that it looks like it’s tough to get an out there, because Eck makes up part of the sample.  However, Pinto wrote that he removes home park fielders from getting the probability out rates, so that’s not it.

***

Pinto’s model is closer to WOWY than to UZR or Dewan.  I’m not sure if he uses a smoothing function or not. Probably not.  From that standpoint, what Shane Jensen does with SAFE is the ideal approach.

***

If you tell any Jays fan that Marco Scutaro is a better fielder than either Aaron Hill or John McDonald or Scott Rolen, he’d think you are crazy.  If you tell them that he’s better than all three of them, he won’t even talk to you.

I’m going to look at the Fans’ Scouting Report and see how many fans selected Scutaro over any of those three guys.  I have some 40 ballots or so, so it’ll be interesting.


#6    Sean      (see all posts) 2008/11/08 (Sat) @ 00:36

Tango, a Jays fan would look tell you are crazy because he’s likely not.  He just had a good/"fluky" year, the true talent fielding of the others is still better.

It was just in 2006 that UZR had him as -26/150 (and oddly enough tied for worst with A. Hill for that year) and PMR had him dead last at -18 (outs) at SS.

His not so great fielding in previous years, may have influenced how Jays fans thought of him this year, thus the not so great FSR rating for this year.

I would bet Scutaro’s true talent level at SS in no more than average.


#7    MGL      (see all posts) 2008/11/08 (Sat) @ 01:22

Again, we got to be careful about confusing and conflating, and any other word that is appropriate, fielding talent with an actual flukey good or bad fielding year (IOW he actually WAS good that year) as well as a flukey good or bad year according to the flawed data (IOW, many of the balls that the data “thought” were easy, were not, or thought were hard, were not, etc.).

Interestingly, I am in the process of computing UZR’s with the BIS data, and we’ll see how that compares to the UZR’s from the STATS data.

FWIW, infield park factors aren’t, or at least should not be, that dramatic.  All I do is basically come up with a “speed” factor for the IF (based on how many ground balls get through the IF as compared to the average IF).  I then apply that factor to all IF’ers across the board.

I am working on some other improvements to UZR, such as better park factors.  For example, while the STATS database does not say when a ball hits off the wall (I think that BIS does), I am going to assume that any ball that is more than X feet in any “slice” of the field is off the wall, based on fence distances, and therefore is much harder to catch for obvious reasons, or in some cases, impossible to catch (like high off the Monster).  That will be helpful in fields like Houston, Florida, Texas, and Boston which have very high walls.


#8    Darren      (see all posts) 2008/11/08 (Sat) @ 10:04

Last year I was concerned how there was a corelation between high scoring teams and better PMR rankings. This would have been the result of using visitor data only, not being able to convert harder hit BIP into outs from better hitting teams (ie: last years best fielding team according to PMR were the Yankees). There does not appear to be that relationship this year in 2008, which is encouraging. I certainly wouldnt have expected to see the Jays on top if that relationship existed.


#9    Sky      (see all posts) 2008/11/08 (Sat) @ 12:50

Hallelujah!

Interestingly, I am in the process of computing UZR’s with the BIS data, and we’ll see how that compares to the UZR’s from the STATS data.

And again!

I am going to assume that any ball that is more than X feet in any “slice” of the field is off the wall, based on fence distances, and therefore is much harder to catch for obvious reasons, or in some cases, impossible to catch (like high off the Monster).  That will be helpful in fields like Houston, Florida, Texas, and Boston which have very high walls.


#10    4seamer      (see all posts) 2008/11/08 (Sat) @ 23:37

How about better advance scouting?  wink


#11    Tangotiger      (see all posts) 2008/11/10 (Mon) @ 17:34

Mike Emeigh rolls up his sleeves based on data provided by Pinto:
http://www.baseballmusings.com/archives/030097.php

I reply:
You have to give credit for positioning over and above what an average fielder would have done.  If the average 2B is going to put a massive shift with Ryan Howard, and the ball is hit right to the 2B, then this is a high probability play, not low probability play.

The question being asked, virtually all the time, is: “Given these circumstances, as best as we can determine them, how would an average 2B fared?” BIS data comes with a shift field.  That should be part of the parameters list for Pinto.  Or, at the very least, used to exclude those plays altogether, as “cannot determine what an average player would have done”.

***

What Mike did is exactly what needs to be done to validate, or invalidate, a system.  A few of us watched the World Series, and came to a decent consensus on each play.  That’s easy, because we all saw the same thing at the same time.  Some plays are tough to call, but an enormous percentage was easy to call.

If David Pinto is giving Mike Emeigh 8 plays that he though that Uggla should have NOT gotten an out (say a .125 probability per play , or a total of 1 outs per 8 plays), and he got an out in every one of those, almost all because of some easy-to-explain reason, then we have a bias to contend with.  It should be handled.

Now, if the bias is random, then given a large enough sample, it all comes out in the wash.  But, no one has said how big this sample needs to be.  Three months of data?  Three years?  What exactly?  No guessing, please.

The solution is what Mike did, and what we were doing during the World Series chat: human observation to classify each play as:
0-10% out probability
15-35% out
40-60% out
65-85% out
90-100% out

Just classify each play in one of those 5 buckets.  And to test how good each scorer is, it’s darn easy.  Two ways: (1) how well he matches up to the other scorers, and (2) how well his buckets match the out rates.

For (2), let’s say he classifies 400 balls in play as a 40-60% out rate.  If the ACTUAL number of outs is outside the 160-240 range, then we know he made a mistake.  He can’t watch 400 balls, presume they are 50/50, expect around 200 ous, and actually have 280 outs.

All you need is three scorers per game, and we’re good to go.  And don’t tell me that’s too much, since the NHL employs twice as many game scorers, and the NHL brings in half the revenue.  The NHL has been able to find a way to justify the cost.  I can’t believe MLB.com can’t.


#12    birdo      (see all posts) 2008/11/11 (Tue) @ 14:57

Again, not to pick out specific players but I have been curious on the differences in Sizemore over the years.  URZ has him consistently in the top five and solidly above average.  PMR has had him right around average for the past few years.  Any thoughts as to where this difference comes from?  Thanks.


#13    Tangotiger      (see all posts) 2008/11/11 (Tue) @ 16:06

With CF, it’s very easy to see the differences.  A 3B will play within a few steps for every batter.  For a CF, he will position himself several dozen feet one way or the other.

Depending on how Pinto and MGL handle the expected out rate based on handedness of batter and other parameters, one can easily see a low-prob-out play while the other sees the exact same ball hit as a high-prob-out play.

It also depends on how they treat high pops, low pops, etc, etc.

If they each had the positioning of the fielder, and the hang time of the ball, they would both come out with similar results.  As it stands, they dance around these two parameters by bringing in a whole host of parameters to try to classify each ball in play.

We’ll be talking about this until FIELDf/x is implemented many years from now, or until STATS, BIS, or MLB.com buy a $1 stopwatch and mark where the CF is positioned.  We really don’t even need to mark the positioning of every fielder.  We really mostly care of the CF, because of the great variance of his positioning (anywhere between LF gap and RF gap).  You can also include SS and 2B, but for now, I’d be happy with just the CF.

Am I really asking for too much?


#14    Peter Jensen      (see all posts) 2008/11/11 (Tue) @ 16:32

We’ll be talking about this until FIELDf/x is implemented many years from now, or until STATS, BIS, or MLB.com buy a $1 stopwatch and mark where the CF is positioned.

Tango - In my Hit f/x article comparing the BIS and STATS data for Torii Hunter and Andruw Jones I found that STATS and BIS couldn’t agree on what ZONE the ball landed in 32% of the time.  What makes you think that they can locate the center fielder’s position by observation any more accurately than they can a ball’s landing location?  Now that MGL has both sets of data maybe he can test over a wider range of outfielders to see if that amount of inaccuracy holds for them as well.  If it does, you better hope for Field f/x soon becuase your stopwatch and observation data won’t yield anything useful.


#15    Tangotiger      (see all posts) 2008/11/11 (Tue) @ 16:50

This is the article Peter is referring to:
http://www.hardballtimes.com/main/article/is-seeing-believing/

Which we discussed here:
http://www.insidethebook.com/ee/index.php/site/comments/cross_checking_the_data_providers/

***

As for the stopwatch, it would be impossible for one person to mark a ball that was in the air for 1.5 seconds, while another would mark it for 4.5 seconds.  Only an inputting error could get that wrong.  Therefore, “won’t yield anything useful” is wrong in its literal sense.  I’ll be happy for anything within 0.5 seconds.  That shouldn’t be too hard to do at all.

I will also point you guys to Robert Dudek’s article in the original Hardball Times Annual.  Maybe someone can petition studes to release that article to the public.  It’s a great piece of research using stopwatches.

As for the location, that is another matter.  Marking a one square centimetre location of say something that has 10 million square centimetres of real estate on a computer screen that is 200 square centimeters has potential for human error.  That is completely different than a stopwatch that does have these reference points to worry about.


#16    Tangotiger      (see all posts) 2008/11/11 (Tue) @ 16:52

correction: ...does NOT have these reference points to worry about


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors