THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, April 21, 2008

WOWY Ripken

By Tangotiger, 10:06 PM

One of my issues under consideration is how to handle the situation when the “without” is alot smaller than the “with”.  WOWY(*), as you know looks at a player’s performance with some parameter and without.  Let’s take the case of Cal Ripken, and some his pitchers (guys who pitcher predominantly, or totally with the Orioles, with Ripken as his shortstop).  One such pitcher is Jeff Ballard.  He had some 70 innings without Ripken as his shortstop and 700 with.  Normally, I scale the “without” data to the “with” because I want to keep my “with” untouched.  I want his actual innings, and outs, and balls in play to reflect his actual totals.  But, in cases like this, I’m very uncomfortable scaling up by a factor of 10 the “without” data. 

I have some options here:


1. Break with what I want, and scale the “with” down to the “without”.  That is, allow either one to scale to the other’s lower sample size.  Now, instead of adjusting just the without, I’m adjusting both.  And the “totals” line of the “with” will not match his actual overall totals.  I don’t necessarily like this.

2. Allow the scaling up a certain factor.  I’m thinking 4 times.  Why 4?  I don’t know.  I’m thinking at least 2.  And I don’t want to go 8 or 10 times.  Four sounds reasonable.  So, in this case, I can scale up to 280 innings.  That still leaves me 420 innings short.  So, I either scale down to 280, or do something else.

Andy likes to scale things as 2/(1/70+1/700), so I’d scale both to 127.  He’s got statistical theory on his side.  That still leaves me with the same basic problem.

3. Fill-in whatever I’m not scaling (that is, the difference between 700 and 127, or between 700 and 280, or between the unscaled 700 and 70), with some fixed average.  That average could be the average for the average pitcher that shortstop had that year, or that he had in his career, or the average pitcher in the league that year, or whatever.

What do you guys think?

(*) With Or Without You, or since MGL is a stickler about it: Without or With You ... dude, I need a cool sounding name, and WOWee sounds better than WWOU

#1          (see all posts) 2008/04/22 (Tue) @ 04:11

I am doing a lot of WOWY (but I call it WWOY so I can call it my own! wink), thanks to you (I love the idea).  I always use either the Andy method (that is the harmonic mean, I think) or the short-cut, which is to weight by the lesser of the two values.  There is little doubt in my mind that that is the best way to do it (either one). 

I know you’d LIKE to be able to use the “with” numbers to do the scaling, but that is just not correct.


#2    tangotiger      (see all posts) 2008/04/22 (Tue) @ 08:15

Right, let’s just say that we stick with what is right (the lesser of two, or harmonic mean).

I add up Ripken’s totals, and I end up with his scaled PA as 80% of his actual PA, as opposed to most other SS where I’ll be at 90%.

So, let’s say I end up with:
SS, actual BIP, scaled BIP, outsAboveAveragePerBIP
Ripken, 50000, 40000, +.001
Ozzie, 50000, 45000, +.002

Now, if I want this career gross outs above average, can I just do +.001 times 50000 for Ripken even though I based it on only 40000?  If so, then what we end up doing is scaling back up by 25%!

And in fact, what I am really doing is filling in the gap of the difference between Ballard’s actual 700 IP and the harmonic 127 IP, with a Ripken career average of +.001 for those missing IP.

On the other hand, if I just do +.001 times 40000 for Ripken’s career, then I am only using the harmonic mean, and am calling “I dunno” on Ripken’s other 10000 BIP, as if they never existed.

On the third hard, if I do +.001 times 40000 plus zero times 10000, then I am assuming “average” for the missing PA.  In effect, I am regressing the gap between Ballard’s 700 IP and the harmonic mean of 127 as average.

See what I mean here?  If I try to come up with a career register for Ripken, I’ve got to cut a corner somewhere, implied or otherwise.


#3    MGL      (see all posts) 2008/04/23 (Wed) @ 01:19

I have not thought about it too much, but if you are “filling in” gaps to come up with career totals, you have to fill in with a regressed number based on the number you got, the .001 per PA.  So just figure out his “true talent” defensive talent and use that for the other 1000 PA.  How you do that from the beginning, to avoid the two-step process, I am not sure.  I am not even sure what I am saying is right.


#4    Guy      (see all posts) 2008/04/23 (Wed) @ 10:29

I don’t see the problem with just scaling back up to 50,000.  Your smaller theoretical sample of 40,000 is created only to maximize the accuracy of your estimate of Ripken’s overall ability, given the limitations of your data (i.e. to avoid giving too much weight to very small samples on either side of the with/without divide).  But once you have that estimate, you apply it to the real numbers of BIP over Ripken’s career, which really did happen.

The question that remains is what if the specific distribution of pitchers Ripken faced actually gave him more/fewer chances to exhibit his skill?  Let’ say he’s truly +.002 with LHP on mound, and just average with RHP.  And it so happens that he played behind 50% LHP, vs. an average of 30% for most SS.  Depending on how many non-Ripken innings his pitchers had, your harmonic mean method may not give proper credit to Ripken for all the outs he made behind LHPs.  But I’m not sure there’s any practical way to deal with this....


#5    Tangotiger      (see all posts) 2008/04/23 (Wed) @ 10:47

Hmmm… this would be analogous to a guy being say +3 wins on 600 non-IBB PA, and then he has another 60 IBB.  The way I do it is: 3/600*660.

That is, one guy’s IBB is not worth the same as another guy’s IBB.  I don’t give it the same value (zero or +.010 wins or whatever) to each player.  If the IBB is (basically) a break-even proposition, it is break-even based on the circumstances.  And this circumstance is that the batter brings up the win expectancy by +.005 wins, just by showing up to bat.  I wouldn’t give him +.010 if the team decided to walk him.

So, regardless if his performance is -.005 wins, or +.020 wins, I wouldn’t give him a blanket +.010 wins.

Tieing this back to what we are talking about, it seems “fair” then to take the “known” performance of +.001 in the known sample (40,000 BIP), and scale that up to 50,000.

I could definitely separate this by LHH, RHH, and in fact, that is one of my splits.  That is, when I do WOWY Ripken/Flanagan, I actually do Ripken/Flanagan/RHH.  So, how does Flanagan do while facing RHH with and without Ripken as his SS.

IDEALLY, I would do Flanagan+Yaz with/without Ripken, but sample size will kill me there.

So, I end up doing: Ripken/Yaz/RHP.  That is, how did Yaz facing RHP do with Ripken as the SS and not.

Same deal with the park, where I do PARK/RHP/LHH/Ripken.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:10
Mail: rWAR v fWAR

Sep 02 15:08
The two uncertainties of UZR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?