THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, May 18, 2009

Overrated RBI guys

By Tangotiger, 02:31 PM

Devil Fingers takes a players runs above replacement, compares it to RBI, to see who stands out.

Since RBI is a counting stat, I would prefer comparing to runs above zero (Runs Created), not above replacement (RAR).  Otherwise, a low RAR guy with lots of PA will look “overrated”, when really he was just “overplayed”.


#1    terpsfan101      (see all posts) 2009/05/18 (Mon) @ 18:55

I agree with Tango that it would be better to compare RBI’s with runs-created, rather than RAR or RAA. For the most part, RBI’s are a counting stat. It makes more sense to compare them to another counting stat, in this case total runs created.


#2    devil_fingers      (see all posts) 2009/05/18 (Mon) @ 19:06

Thanks for the link.

I understand why Runs Above Zero would be more straightforward and more helpful than doing what I did, which as basically runs above average plus a bunch of other stuff to account for guys below average. As I was working on this stuff, I considered it, but didn’t do so because

1) I was inspired to do this by Jonah Keri’s piece in BBTN on RBI, so I wanted to be on a similar scale, which I suppose isn’t that big of a deal, but much more importantly

2) This is the first part out of a two-part series. This covers the more “obvious” approach. The next part will take the same topic using RE24, which (if I understand it correctly) is on the same scale as lwts above/below average. I want to see if a significant number of these guys are actually good situational hitters (or at least significantly better than their context-free lwts would indicate). So while the way I did RBI/BRAA is a hack, for comparison with RE 24, it makes it easier to see the relation, at least from my standpoint. I plan on publishing that piece early next week (it won’t be as comprehensive, as downloading and joining the basic batting and win probability batting leaderboards on FanGraphs season-by-season from 1973-2009 is too much manual work for me).

Maybe I’m wrong about this BRAA/RE24 issue, and I’d love to hear suggestions here or at my post at Driveline.

I should add that I wasn’t so much looking for specific players to “pick on”—I’m not worried about the masses thinking that Tony Batista was really good in 2003 or anything. I just wanted to make the general point that one can amass a lot of RBIs and be a pretty terrible offensive player from the standpoint of lwts. Using the word “overrated” so much probably gives people the wrong idea, and that’s my fault. My original title was the “Carter-Batista project,” which has a certain aesthetic appeal, but I wanted something more straight forward.

I did run a version with wRC this afternoon just for 2008, and while the order is a bit different, it’s the same group of guys that I got originally. I didn’t do it for all seasons 1973- yet… the query takes forever and I don’t have the patience right now. Any suggestions on how to break down SQL queries or reorder certain things to speed things up, Anyone? Maybe my computer’s too slow—I can’t really do anything else on it when the queries are running.


#3    terpsfan101      (see all posts) 2009/05/18 (Mon) @ 22:51

Devilfingers, you can convert RAA by the 24 base-out states to runs created in the same manner you convert regular linear weights to runs created. Just add the average R/PA (league runs divided by position player PA’s) to the RAA-24 total.

Here are some ideas for speeding up your queries. Probably the most important thing you can do to speed things up is defragment your hard-drive. You can also try disabling background programs (anti-virus, spyware, and any other unecssary applications) when you are running your queries. This might free-up some additional memory.

I have the same problem of not being able to do anything else (except surf the web) when running queries in Access. It is definitely an issue when working with play-by-play data. My query to calculate linear weights takes 20 minutes to run. However, it isn’t all that time-consuming when working with the BDB database, unless the queries are complex.


#4    devil_fingers      (see all posts) 2009/05/18 (Mon) @ 23:52

alright… I added three more sheets to the spreadsheets sorted by RBI/BRC (Batting Runs Created). I didn’t park adjust for that, which probably would have been better, but the basic idea is still there, and the basic groupings are the same. I hope people will check it out.

Thanks, terpsfan, for the suggestion about RE24—I had wondered about that. Like I said, the groups are the same, and I’m trying to decide how to investigate this in a simple, straightforward way that won’t get me into shaky statistical ground. I think I’ll post totals for a few recent seasons, but then go back to some of the “best” and “worst” seasons on the current list and compare the RE24 (whether above/below average or not) with the traditional linear weights and RBI totals. Like I said, the idea was just to get groups of players who are better/worse than their RBI totals suggest. Part two has more of a potential payoff, since it will hopefully begin to separate the wheat from the chaff as far as “situational hitting” goes. I really don’t know what the results will be.

I’m just using BDB—is that what you used for the linear weights query? Seems like I defragged just recently, I’ll have to go back and check. Thanks for the suggestions.


#5    Colin Wyers      (see all posts) 2009/05/19 (Tue) @ 00:25

There’s a lot you can do with either subqueries or temp views/tables. Also, for performance, if you’re using any kind of joins - index, index, index. If either of you have specific query performance problems, either try the BaSQL forum (which never really took off, but I still try to keep tabs on it) or go ahead and e-mail the RetroSQL list. Or send me an e-mail, if you like:

pontifexexmachina at hotmail.com

If you don’t hear back from me after a few days, please feel free to e-mail me again reminding me about it. Oftentimes I read an e-mail when I’m at work, mean to reply to it when I can get home and spend some time on it, and then forget about it. (It is something I am working hard on correcting.) So please, if I’m not getting back to you, don’t be afraid to bug me.


#6          (see all posts) 2009/05/19 (Tue) @ 00:57

I’m not a stats guy, but anyone who knows baseball would take the list of the “underrated” over the “overrated” any day.  A simple stat like this that reinforces the difference between the value of Barry Bonds and the “value” of Joe Carter is a nice tool to have.  Ramon Santiago has 19 RBI this year . . . .


#7    terpsfan101      (see all posts) 2009/05/19 (Tue) @ 04:33

Colin is definitely the person you would want to talk to about writing efficient queries. The 20 minute query I was referring to is the query where I calculate linear weights from play-by-play data. Since there are approximately 8 million PBP records, I am not suprised that it takes this long for it to run. If I ran it on my desktop (Celeron processor with 256 MB RAM), it would probably take over an hour to run.


#8    pft      (see all posts) 2009/05/20 (Wed) @ 03:16

Funny how the over-rated RBI guys seemed more from the pre-steroid era, while the under-rated tended to come more from the steroid era.  Is it not likely that what is valued today was not as highly valued in the pre-steroid era.  Nobody ever got a big contract for his high walk total in the 70’s. It was all about HR’s, batting average and RBI’s. In fact, Ted Williams used to get boo’s for walking.


#9    terpsfan101      (see all posts) 2009/05/21 (Thu) @ 03:00

PFT,

If you break down linear weight values into their 3 components: getting on-base, moving runners-over, and the inning-killer, you will see that moving runners over becomes less important as the run enviornment increases. Obviously moving runners corresponds to RBI’s. So RBI guys like Joe Carter (mediocre OBP, decent SLG) would have more value in a lower run-scoring context.


#10    devil_fingers      (see all posts) 2009/05/22 (Fri) @ 13:44

I’m planning on doing “Part Two” next week, and given the comments here from Tango and terpsfan, I’m wondering if you all have any suggestions about how to go about it.

As I mentioned before, I want to go back over the same sort of stuff using RE24 in place of BRAA/wRAA/lwts to see if there’s anything there of signifiance. Originally, I had planned ot just use RE24 the same way I had used lwts above average. I might still, but given the suggestions here, now I’m not sure. On one hand, I want to be as “accurate” as possible (with the caveat that I know this isn’t a detailed statistical study), on the other hand, I want to keep enough continuity with the last piece so that I don’t confuse the readers.

Perhaps simply taking the “pool” of overrated rbi guys I generated and comparing their context-neutral linear weights with RE24 would be the most straightforward instead of regenerating all the lists?

I’ve already done a bit of the RE24 version of absolute runs created—its’ easy enough using FanGraphs: simply do wRC=wRAA+RE24, although you can only export one season at a time, so I’m not going all the way about to 1973 with that one...)

Any suggestions are appreciated.



#12    terpsfan101      (see all posts) 2009/05/23 (Sat) @ 02:45

Devil Fingers, in your next article I recommend that you mention how dependent RBI’s are on the batting order slot. Perhaps you could list Tom Ruane’s chart on the frequency of the 24-base-out states by lineup slot:

http://www.baseballthinkfactory.org/btf/scholars/ruane/articles/situational_hitting.htm

Since this isn’t an exhaustive analysis of RBI’s, I would just use the “pool of overrated/underrated RBI guys” from your last piece in your next article. Again, I would recommend that you use RC instead of RAR. But for what you are doing here, I can understand the decision to use RAR. Finally, I wouldn’t worry about park-adjustments. But if you decide to keep the park adjustments, you should park adjust the RBI’s as well.


#13    devil_fingers      (see all posts) 2009/05/23 (Sat) @ 21:59

thanks again, Tangotigers and terpsfan. I have been thinking about the relationship of RE24 and batting order by slot. I haven’t quite decided exactly what I’m going to do, but the suggestions are helpful as are the Ruane links.



#15    terpsfan101      (see all posts) 2009/05/27 (Wed) @ 15:05

Nice follow-up to the first article. I like any kind of article that shows how context-dependent RBI’s and/or individual runs scored are. Last year, I asked my dad what the most important stat for a hitter was. I was hoping he would say OBP, but he said RBI’s. My dad has been following baseball for nearly 50 years, and is a very intelligent man (he is a lot smarter than me). Of course, some players and most announcers don’t even realize how overrated RBI’s are. Last year, I remember Terry Kennedy saying the NL MVP should go to Ryan Howard since he led the league in RBI’s.


#16    devil_fingers      (see all posts) 2009/06/02 (Tue) @ 12:02

I posted yet another follow up with lists of the career RBI/wRC of players with more than 1000 and 500 RBI since 1954.

http://www.drivelinemechanics.com/2009/6/1/895126/return-of-the-son-of-the

Some fun and (hopefully) interesting results.


#17    terpsfan101      (see all posts) 2009/06/05 (Fri) @ 20:09

Interestingly the RBI guy appears to be a modern phenomenon. Looking at all players with 1000+ RBI’s since 1871, the top 11 in RBI/RC ratio are from the Retrosheet era. In the top 20, the only players prior to the Retrosheet era are Yogi Berra, Dale Ennis, and George Kelly. Even if I make the criteria 500+ RBI, the list is still dominated by Retrosheet era players. I guess a big reason for this is more errors were committed back in the day, and RBI’s aren’t awarded when a run scores as a result of an error. The rate of RBI to runs scored gets lower and lower the farther back you go.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro