THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 05, 2007

Bill James, Online

By Tangotiger, 10:14 AM

In the new Bill James 2008 Handbook, James refers to “Bill James Online”.  If you google that you get: Batter Profiles, among a host of profiles on the left, and other navigation on top.  It’s still in Beta mode, but this is the internet, and “Beta” means “production-unready”.  You can’t hide.  The search button is not working, but Google is.  Try this:
site:billjamesonline.net “fielding bible”
A couple of clicks gets you here and an impressive list of excerpts.  (In the 2008 Hardball Times Annual, I will also have a Jeter fielding article.) You can also register on their site, but, if you can work Google, you can probably get by.  Feels just like it did 25 years ago.

I will say that it probably would have been better if Bill James bought out Sean Forman.


SabermetricsData
#1    Tangotiger      (see all posts) 2007/11/05 (Mon) @ 10:35

Rich Lederer has a focus for the Handbook here:
http://baseballanalysts.com/archives/2007/11/the_bill_james.php

And he writes:

The 2008 Handbook also includes team baserunning. As James writes, “There will be a time in the future, probably not too long from now, when this baserunning data will be published for all teams and all players over the last 50 years. When that happens, we’ll be in a better position to understand the role of baserunning (other than base stealing) in creating runs. This data is the first step along that road.”

I wrote to Bill and John that this data does already exist at the team level:
http://www.knology.net/~johnfjarvis/stats.html

For example, the 1957 NL:
http://www.knology.net/~johnfjarvis/datapages/stats1957n.html#10

And I have data at the era level:
http://tangotiger.net/destmob1.html

And the difference between 1978-1990 and 2000-2006:
http://tangotiger.net/destmob0.html

The 2000-2006 data came from Pizza Cutter.  I’ve got my Retro database all set to go, so I’ll be in a great position to spit out more of this stuff.  I’m going to look at park splits along these lines as well, through time.


#2    Tangotiger      (see all posts) 2007/11/05 (Mon) @ 16:45

Bill James invents FIP:
http://www.billjamesonline.net/ArticleContent.aspx?AID=173&Code=James01006

Innings Pitched Times 3, minus strikeouts
Divided by 3
Plus Walks
Plus Hit Batsmen
Plus 4 times Home Runs
Times 9
Divided by Innings Pitched
...
-5
times
.424

Which is:
( ( IP-K/3+BB+HBP+4*HR )/IP*9-5 )*.424

Which is:
( IP-K/3+BB+HBP+4*HR )/IP*9*.424-5*.424

Which is:
( IP-K/3+BB+HBP+4*HR )/IP*3.82-2.12

Which is:
( 1+(4*HR+BB+HBP-K/3)/IP )*3.82-2.12

Which is:
3.82 + ( (4*HR+BB+HBP-K/3)/IP )*3.82-2.12

Which is:
( (12*HR+3*BB+3*HBP-K)/IP )*1.27 + 1.70

Compare this to:
(13*HR+3*BB+3*HBP-2*K)/IP + 3.20

(The 3.2 is a floating value that you set for the league.)

Maybe one of you can double-check my math.  But, pretty darn close, no?

Here’s someone else independently arriving at the same thing:
http://members.cox.net/~harlowk22/DIPS-GS.html

I’m definitely not looking for credit.  What I did was simply DIPS-lite.  It was repeated by Kevin Harlow above.  And Clay at Sports Mogul did the same several years ago.

Hopefully, this BJO product of Bill’s will finally be the mechanism to expose Bill to the work that’s being done elsewhere on a two-way street.


#3    Tangotiger      (see all posts) 2007/11/05 (Mon) @ 17:26

Btw, you should definitely become a Beta user.  Bein a beta user is free.  But, if you sign up for 3 months (total of 9$), you get 12 months of access.

I always felt that charter subscriptions should be of this model.  It’s a big thank you to whoever was in there in the beginning.


#4    Tangotiger      (see all posts) 2007/11/05 (Mon) @ 18:05

I plugged in Bill’s equation into my career database of 1993 and later starters (changing his constant to .455) as well as my FIP (using 3.07 for the constant), and here’s the top 20 and bottom 10:
bjFIP ttFIP Player
2.87 2.78 Pedro Martinez
3.13 2.88 Randy Johnson
2.82 3.18 Greg Maddux
3.00 3.19 Kevin Brown
3.22 3.22 Curt Schilling
3.06 3.23 Roy Oswalt
3.44 3.37 Roger Clemens
3.75 3.45 Mark Prior
3.58 3.46 Jake Peavy
3.40 3.49 Brandon Webb
3.78 3.51 Scott Kazmir
3.66 3.53 Erik Bedard
3.36 3.59 Roy Halladay
3.55 3.60 Ben Sheets
3.51 3.60 Mike Mussina
3.37 3.64 Bret Saberhagen
3.76 3.65 Josh Beckett
3.55 3.65 Shane Reynolds
3.53 3.68 Andy Pettitte
3.69 3.73 John Lackey
3.71 3.74 C.C. Sabathia
...
5.41 5.06 Kevin Foster
5.34 5.08 Ryan Rupe
5.19 5.08 Ramon Ortiz
5.49 5.11 Kazuhisa Ishii
5.24 5.15 Jamey Wright
5.37 5.20 Jim Parque
5.64 5.36 Paul Abbott
5.68 5.43 Scott Elarton
5.77 5.67 Dennis Springer
6.13 5.97 Mike Moore

They are roughly similar.

Bill James’ top 10 pitchers had a simple average of 3.15 using his FIP equation with a real-life ERA of 3.40.

My top 10 in FIP was 3.26, with a real-life ERA of 3.20.

This may be because I changed his multiplier when I should have changed his “-5” part.

The big difference is our treatment of the strikeout.  The top 10 in strikeouts per BFP, here’s how we see them:
BJ: 3.78
me: 3.54
actual ERA: 3.55


#5    Tangotiger      (see all posts) 2007/11/05 (Mon) @ 18:07

When I take the top 200 pitchers in BFP since 1993, and I correlate the Bill James FIP version against my version, I get an r=0.96, which essentially means we’ve got a really really similar equation.


#6    Fargo      (see all posts) 2007/11/05 (Mon) @ 18:31

Tango, you should also run a regression and see whether the intercept (constant) is significantly different from 0, and the slope significantly different from 1.0.  You could have R=1.0 and still have different slopes, intercepts, and mean values.


#7    Tangotiger      (see all posts) 2007/11/06 (Tue) @ 10:39

There’s an 8% difference in the slope, and 0.35 in the intercept.


#8    Tangotiger      (see all posts) 2007/11/06 (Tue) @ 16:36

Here’s another Bill James article, on the best relievers ever:
http://www.billjamesonline.net/ArticleContent.aspx?AID=171&Code=James01008


#9    MGL      (see all posts) 2007/11/06 (Tue) @ 16:51

Re: James, I wrote this on BTF:

I don’t know why James continues to be such a Jeter apologist. He has explained for at least 2 years now, how Jeter has these incredibly bad defensive numbers, something that many of us have known for 5 years or so, yet whenever he is aksed if he “peronally believes that Jeter is the worst SS in baseball,” he always says no.

Now, with Hanley Ramirez in the picture, he might not be. But before Hanley, it has always been a clear tossup between M. Young and Jeter. I mean, other than Ramirez, who does he think is a worse SS?

And he still thinks that Jeter is the best overall SS in baseball? What? O.K. we are in a cycle of a dearth of great SS, but has he never heard of Jose Reyes? They are probably close to 30 runs (per season) apart in defense. Does James truly think that Jeter is 30 runs better in leadership, offense, and whatever other nonsense he includes in Jeter’s overall value? And that is not to mention age of course. And what about Reyes’ basestealing and baserunning? If you include that (which you must of course), they are not even close overall.

I think that James tries to be too PC sometimes.


#10    MGL      (see all posts) 2007/11/06 (Tue) @ 19:56

I also wrote this (which I have said many time before) in response to the usual (tired) argument that James is only giving deference to the “people” who say by watching Jeter that he is anywhere from average to brilliant (does anyone credible say that he is brilliant anymore?):

First of all, there are very few methodological problems, if any, with the advanced PBP metrics in the IF (e.g., no significant park factors to worry about, no “discretionary” plays), so if you have a large sample of data, these advanced metrics really do pertty much tell you how a fielder has done relative to his peers (of course).  I mean you can make all kind of piddling arguments AGAINST PBP metrics for the IF (the OF is a different story), and they are not going to amount of a hill of beans.  Unless of course you don’t have the capacity to understand the methodology of these PBP metrics in the first place.

Using a combination of scouting and metrics is fine, but…

Again, for the IF only, once you get 4 or 5 years of data, adding scouting really doesn’t tell you antyhing you don’t know from looking at the data (although it is possible of course that in rare instances, the stuff that goes into the data does not EVEN OUT even in large samples - that is what it is called “sample data").

On top of that, and this is one of the ironies of the whole Jeter situation, when people talk about Jeter being an average of better SS according to traditional sources, they are NOT talking about SCOUTING!  They are talking about guys like Michael K, who have little idea what they are talking about (in thos regard - I actually like him), and the mainstream media, who have even less of an idea what they are talking about.  If you ask “real” scouts (including the fan scouting report, which is probably about as good as you can get as far as scouting is concerned), I think they will tell you, colectively, that Jeter is decidedly less than average.

So, for James, or anyone else, when they say that Jeter is not as bad as his data indicate, there is simply no creidible evidence to support that claim.  The “evidence” is “that group of people” that say that he looks anywhere from average to brilliant when they watch him.  What kind of evidence is that?  And as I said, even if the “scouts” truly said that Jeter is “average at worst,” there is no credible reason to give that much weight when we have 5 years of data that says otherwise.  James should no all that. 

In fact, he wrote a brilliant article in the first Fielding Bible “explaining” why Jeter has been in fact so poor on defense, and how the data is so reliable.  Why would he then say, “I personally don’t think that Jeter is so bad.” The only reason I can think of is his desire to be PC (either consciously or not) or that he too has been fooled by the media.  As I have always said, I consider James to be a brilliant thinker and writer, but not much of a scientist.

Finally, the only way we assess players is by data.  No one says that A-Rod is one of the greatest hitters in baseball, or Bonds, or that Ruth was one of the greatest of all time because of the way they look when we watch them.  We say it because of the statistical evidence. The same should go for defense in this day and age when the PBP defensive metrics are prety darn good (not perfect of course).  Honestly, and I am biased, if there is any reasonbale discussion about the “accuracy” of the PBP defensive metrics, it should be limited to the OF, or perhaps at worst, 1st base in the IF.


#11    Tangotiger      (see all posts) 2007/11/06 (Tue) @ 21:37

That brilliant Jeter v Everett article is linked to from the main blog entry (look for the “excerpts” link).


#12    MGL      (see all posts) 2007/11/07 (Wed) @ 20:15

Yes, and that is a link from the 07 Bible, not this years.

As I said, and I will say it again, you CANNOT write an article (Jeter versus Everett) like that and then, with a straight face, say, “I don’t think Jeter is that bad of a defensive player (which James has said many times).” And, “He is still probably the best SS in baseball.” I think that he has even said (and I may be wrong) that Jeter is one of the best players in baseball.  As I have shown many times, Jeter is not one of the best 20 players in baseball.  And is probably THE most overpaid players in baseball, by far.  Did I say, by far?  Based on his Slwts values, that is.  If teams want to pay a lot for fan attraction, good looks, leadership, and what have, maybe he is underpaid, I don’t know.  I personally don’t get paid for making, nor am I even remotely qualified to make, those kinds of evaluations.


#13    Tangotiger      (see all posts) 2007/11/09 (Fri) @ 18:41

This is what I did:

1. Take all pitchers since 1993 with at least a sum total of 1000 BFP.  There were 727 pitchers.

2. Figure their K/IP, (BB-IBB+HBP)/IP, HR/IP and ERA

3. Run a simple regression of those three elements against ERA.

The results:
r=0.85

FIP = (13.0*HR + 3.2*BB - 2.2*SO)/IP + 3.22

(The “BB” refers to BB-IBB+HBP.)

So, that’s the evidence of why the 13/3/-2 works so well.

4. Repeat the regression, but run against RA.

The results:
r=0.84

FIP = (12.3*HR + 3.7*BB - 2.5*SO)/IP + 3.75

Here, it gets interesting.  The run value impact of the BB and K increases, while the HR decreases.  Very interesting.

Anyway, I don’t see any reason to go with the Bill James version.  One of the above two is what you want.


#14    tangotiger      (see all posts) 2007/11/09 (Fri) @ 20:59

One of the things that happens when we use “IP” is that it masks a bit what’s happening with the K.  What if, instead of dividing by IP, we divide by “BFP/4.29”? 

We obviously don’t need the 4.29 there, but I’m trying to keep the scale the same.

Correlating against RA, and we get:

FIP = (11.6*HR+3.7*BB-2.7*K)/(BFP/4.29) + 3.99
r=.82

And against ERA:
r=.83
FIP = (12.5*HR +3.3*BB - 2.4*K)/(BFP/4.29) + 3.41

Use whatever floats your boat.


#15    tangotiger      (see all posts) 2007/11/09 (Fri) @ 21:07

Note that in all these posts, BB refers to BB-IBB+HBP, and BFP refers to BFP-IBB.


#16    tangotiger      (see all posts) 2007/11/09 (Fri) @ 21:09

I also noticed something interesting.  The leaders in IBB since 1993 is:
Glavine: 104
MAddux: 102
Smoltz: 66

And of the top 10 in BFP since 1993, you get this list:
Player IBB
Tom Glavine 104
Greg Maddux 102
Mike Mussina 24
Randy Johnson 24
Jamie Moyer 41
Roger Clemens 27
Kenny Rogers 21
Curt Schilling 31
David Wells 39
Tim Wakefield 28

Quite a difference in approach, isn’t it?


#17    MGL      (see all posts) 2007/11/10 (Sat) @ 13:54

Two things.  I think that IBB decisions are almost always 95% the result of the skipper.  We hear from commentators how the manager sometimes goes out to the mound and asks certain pitchers whether they want to pitch to a certain batter, but by and large I think the manager makes the deicision.

I vaguely recall that ATL led the league in issuing IBB this year?  So the Glavine, Smoltz, and Maddux high IBB are probably about their manager and not the pitchers themselves.

Two, with that many IBB, most of them are probably “wrong.” I wonder how much in ERA and wins that cost those pitchers. Probably not much, but something. 

I always wondered how much Maddux’ (light-hitting) personal catchers have cost him in wins over his career.  Maybe nothing of course, if he is indeed more comfortable with them and would not have pitched as well without them.  But we could say the same thing about anything.  What if a religous pitcher refused to pitch on Sundays throughout his career and he said he could but he wouldn’t feel comfortable?  IOW, saying, “What if Maddux didn’t feel uncomfortable with his best hitting catchers throughout his career,” doesn’t really mean anything, since he did and that is part of his “skill set” just like the fact that he never was (at least late in his career) much of a high pitches per start guy, which no doubt affected his win total, is part of his skill set.


#18    tangotiger      (see all posts) 2007/11/10 (Sat) @ 17:11

http://www.baseball-reference.com/pi/psplit.cgi?n1=maddugr01

Go to the bottom.  MAddux career OPS against with Javy (the guy he didn’t want) was .572.  Perez, the guy he did want, was .601.

My guess is that it’s alot of b.s.


#19    MGL      (see all posts) 2007/11/10 (Sat) @ 21:30

Before that it was Charlie Obrien (his light-hitting personal catcher).

Well, what do you say if you are the manager and the best pitcher on the planet wants to pitch to his personal catcher?


#20    tangotiger      (see all posts) 2007/11/10 (Sat) @ 22:58

I let him do it of course.  I have no idea if I’ll have a Rick Ankiel on my hands or not, and I have to rest my star catcher at least 20 games anyway.


#21    Tangotiger      (see all posts) 2007/11/15 (Thu) @ 15:48

I calculated each pitcher’s BaseRuns-based ERA (basically, their component ERA, using BaseRuns as the framework).

I then ran a correlation between this component ERA and my FIP and James’ FIP.  The r was .90 with mine, and .85 with James’.

When I run it against ERA itself, my FIP is r=.84, and James’ is r=.80.


#22    MGL      (see all posts) 2007/11/15 (Thu) @ 23:55

Since FIP is designed to regress (100% I guess) non-hr hits, I don’t know whether a good FIP should have a high or low correlation with ERA or component ERA, so I don’t know what those results are supposed to mean.

Since FIP is supposed to be a better predictor of future ERA than ERA (but not necessarily ERC), then what you want to do is test your FIP and James’ FIP with ANOTHER year’s ERA.  The one that correlates the best is probably better.  Of course, as we all know, but sometimes forget, these comparisons are not very enlightening unless we know the standard error of the difference in the estimates (the statistical significance of the results).


#23    studes      (see all posts) 2007/11/17 (Sat) @ 15:03

Bill James does linear weights:

http://billjamesonline.net/toursite/HottestHitters.asp

Not too impressed with his weights, but this is the sort of thing he’s great at: coming up with a “temperature” scale to define who’s “hot” is cool.

Get it?


#24    studes      (see all posts) 2007/11/17 (Sat) @ 15:09

I think at least half the point of both the site and Gold Mine will be their statistics.  Evidently, they feel they’ve come up with some unique way of looking at baseball stats.  Here’s an example on the site:

http://billjamesonline.net/toursite/RBI-Analysis.aspx

That is pretty interesting, though I don’t have any interest in RBI’s myself.  Don’t know if you’ve noticed, but the nickname for their venture is “Be Jolly,” which is what bjol sounds like when you say it out loud, kind of.


#25    Tangotiger      (see all posts) 2007/11/20 (Tue) @ 14:50

http://www.billjamesonline.net/ArticleContent.aspx?AID=81&Code=James01037

I’ll get to the point. The formula I settled on is
2 HR + .25 OTB = RBI

Which can also be written as:
TB/4 + HR = RBI

I responded as:

The only difference between a triple and a HR for RBI, is that you have one fewer run.  And since you have about 0.60 - 0.65 runners on base on average, you’d expect a triple to drive in around 0.60ish runners, and the HR will drive in those same runners, plus himself.  So, the expected gap in RBI between a triple and a HR is exactly 1.  You are showing 1.25 as the difference.

However, there are more runners for the middle of the order, and there are more HR hit in the middle of the order.  So, what we have here is an inference.  Knowing that a guy hits lots of triples, he’s probably a leadoff hitter, and therefore, he doesn’t get to see many runners on base.  Knowing that a guy hits lots of HR, he’s probably middle of the order, and therefore, sees lots of runners on base.

The HR therefore has a proxy value, in addition to its own value.

The 3B has an RBI value of around 0.60ish, and the HR has an RBI value of 1.60ish.  However, if you are finding that giving the HR an overall value of 2.00 is correct, then you are really inferring that you have a proxy value of 0.30ish RBI per HR, as a way to describe something that is not in your dataset (his lineup slot).

I should also point out that 3B and HR are not-random events, that there are more HR hit with bases empty than expected.


#26    Tangotiger      (see all posts) 2007/11/20 (Tue) @ 16:26

http://www.billjamesonline.net/ArticleContent.aspx?AID=135&Code=James01030

I did something very similar in The Book, and I called it Tank Level, as in “how much left in the tank”.  IIRC, I would take 100% of the batters faced the day before (T-1), 75% at T-2, 50% at T-3, 25% at T-4.  (Maybe someone who has The Book can correct me.) Bill James uses: 100% at T-1, 80% at T-2, 60% at T-3, 40% at T-4, 20% at T-5. Essentially, it’s the same idea. I’m surprised that James did not mention The Book in any way in his article, which is a bit disappointing.

There’s some slight differences.  For example, I looked at all relievers 99-02, while he’s focusing on Mariano’s career.  While I didn’t see any impact at my group, he does see it in his player:

These are Mariano’s Games Appearances and ERA by the four groups:
Games ERA

1) High Fatigue 90 2.71

2) Fatigue Fairly High 164 2.18

3) Fatigue Fairly Low 185 2.00

4) Low Fatigue 201 1.60

...
Games ERA Yankee WPct

1) High Fatigue 63 2.57 .608

High Fatigue NOT INCLUDING DL 63 2.57 .575

2) Fatigue Fairly High 123 2.36 .586

3) Middle group 137 2.16 .594

4) Fatigue Fairly Low 152 1.67 .630

5) Low Fatigue 165 1.77 .635

I am not a gambler, I don’t write to gamblers specifically, and I don’t claim to know the things that gamblers can profit from knowing.  However, it would seem to me that if you have a gambler’s focus, this would be an extremely important “razor” to sort the data, to decide whether to bet for or against the Yankees.  Fifty, sixty points of winning percentage is a huge edge.


#27    Tangotiger      (see all posts) 2007/11/20 (Tue) @ 16:43

http://www.billjamesonline.net/ArticleContent.aspx?AID=132&Code=James01031

This is Bill James version to merge WPA and Wisdom of the Crowds. 

A snippet of his argument against WPA as a human-free system:

We could do it, of course, by some sort of situation-value analysis. A team’s chance of winning is .342 before this at bat, .421 after the at bat. . .total up the changes in the value of the state, and you have the value of the player.

That works real well in theory, but practically not at all in fact. To deal with defense, in that scheme, is either

* impossible, or
* extremely complicated and not very successful.

A fielder goes to the wall and…

There’s no question that WPA would benefit with more granular data.  Everything is a state, and you’d like to know the chances of winning every time someone touches (or should have touched) the ball, so you can determine the delta on the play.  So, he’s right, it would get complicated.

However, it really doesn’t matter.  What he’s talking about, while very real if you look at one game in isolation, will simply be swept away over a season.

But, to the extent that it does matter, you could add extra parameters to allow human judgement as to how fieldable a ball is.  I think that’s as far as I’d need WPA to go to make it worthwhile.

His manual system on the other hand could potentially have many more biases to be concerned with. I don’t think you’ll get the payoff you need.

As for his main issue that led to the post, about ARod and Ortiz, Ortiz was the WPA leader in 2005 over ARod by a wide margin (+8.9 wins to +6.1 wins):
http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=0&season=2005

Wouldn’t it have been an important thing to say, that WPA actually called Ortiz a far more instrumental hitter than ARod in helping his teams win those games?

***

I’m looking over this thread, and it sure seems I’m being picky, or contentious in dealing with James here.  It’s not something I’m trying to do.  I’m reading all of his stuff, because I love reading all of his stuff.

But, it seems to me that there’s holes here, and the medium we now have with Bill James allows this instant analysis.

I don’t believe I’m wrong or unfair in anything I’ve said.  As long as I’m fair and truthful, I’m thinking everything I say is fine.  Which is what I think I’m doing here.


#28    Tangotiger      (see all posts) 2007/11/20 (Tue) @ 19:51

Page 230-234 in The Book shows my research on The Tank levels of relievers.  In “The Box” on page 234, I said that when relievers get 16 to 24 effective batters depleted from The Tank, they are still just as effective.

James’ Tank level is basically 5 times higher than mine, so that would be equivalent to around 100 “points” for him.

It’s possible that Mo is simply an anomoly, or his sample size doesn’t make his splits that significant.  I don’t know.  I’d like to see his OBP and SLG numbers, along with PA, so that we can better tell.

***

(And I finally copied the PDF file to my portable drive, so I’ll finally be able to reference it at the office.  I don’t know why it took me so long to do that!)


#29    studes      (see all posts) 2007/11/22 (Thu) @ 18:39

Tango, how are you getting this stuff?  I get a login box when I click on your links.


#30    tangotiger      (see all posts) 2007/11/22 (Thu) @ 23:06

Hmmm… I signed up as a Beta user, but that doesn’t look like an option anymore.


#31    Tangotiger      (see all posts) 2008/01/24 (Thu) @ 16:54

The editor of the Bill James Gold Mine book:

http://www.amazon.com/dp/0879463201?tag=tangotiger-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=1597971294&adid=0A9Z11V4ETX1SQXJ547K&

What the Gold Mine will be is about 15 new articles by Bill on a variety of subjects--measuring consistency, strenght up the middle, frustrating losses, clutch hitter of the year and others, some serious, some fun--and a bunch of new stats and “nuggets” (Gold Mine, get it?) about all thirty major league teams and their players.

(If you are going to buy the book, do it through the above link.)


#32    Tangotiger      (see all posts) 2008/02/07 (Thu) @ 14:31

A few years ago, I read a posting by Bill James where he said that the different flavors of run creation formulas out there are more like a personal taste, because they all are about the same.  I don’t necessarily disagree.

A few months ago, I pointed Bill James to my Markov calculator, which shows quite definitively that Runs Created is wrong at the high extreme levels.  Much to my surprise, James responded that he was aware of that, and articulated the correct reason (that the impact of the HR with all those runners on base is diminished since those runners have a good chance of scoring without the HR).

But, what seems contradictory to me is when James continually refines his RC formula, so much so that it doesn’t look much like the original.  Why not stick with one of the earlier tech versions instead of:
a) doing this:
B = 1.125*S + 1.69*D + 3.02*T + 3.73*HR + .29*(W - IW + HB) + .492*(SB + SH + SF) - .04*K

b) then going through the long Theoretical Team approach

It seems to me that he at least somewhat cares about the accuracy of the model.  And that ugly B component in RC is no prettier than the B component in BaseRuns. 

My question to you guys: should I bother writing to Bill James, and asking him if what it would take for him to publicly accept BaseRuns?


#33    Tangotiger      (see all posts) 2008/02/14 (Thu) @ 17:11

Sounds like you guys don’t think I should bother.  That’s ok.  Probably for the best.

***

What Jeff does here is similar to one of the charts that James is going to do:
http://www.minorleaguesplits.com/tm/AshSAL/bbip06.html

I enjoy these kind of charts, but would prefer that they be turned into “frequency” charts.  That is, you can make the first column the total number of samples (n), and then make all the other columns a rate of that.  This way, you can make comparisons, plus you’ll also be able to get the raw totals if you wanted.

Jeff: I find the TABLE tag is much nicer if you set cellpadding = 4, and cellspacing = 0.  It’ll look cleaner and crisper.  Right now, you have 3 for each.  Compare:
http://www.minorleaguesplits.com/tm/AshSAL/bbip06.html
http://www.tangotiger.net/temp/bbip06.html


#34    Tangotiger      (see all posts) 2008/02/21 (Thu) @ 18:12

BJO is now online officially.

I’ll open threads here as are warranted or asked of me.  Or simply use this thread.

For those who can’t get enough of their BJ fix, this site will be great.  For those who want some better web features, you might figure they should have spent more time on the design of the site.  (The comments will easily become a mess, and the stat pages don’t link to each other like B-R and Retro.)

But, the BJ fix cures all.


#35    tangotiger      (see all posts) 2008/02/21 (Thu) @ 21:00

One thing I’ve been meaning to do for a long time is “batter’s assists”.  Basically the guy who moves the runner over who eventually ends up scoring.  (I’m certainly not the first one with the idea.) I’m not sure what relevance it has, other than it’s an easy bookkeeping thing to do.

BJ is tracking that very thing, and he gives it a cooler name too: Ghost Runs.


#36    Tangotiger      (see all posts) 2008/02/27 (Wed) @ 15:31

I’m annoyed.

James’ website is straight out of 1998.  He’s got a page with a list of all his articles, which is good.  But, that list has only the title.  No summary, no teaser, no “category”, nothing to describe it in any way.  Furthermore, you can comment on each article, but the article listing doesn’t show you if there are any comments, nor the last time a comment was made.  So, there are no incentives for me to go back to an article.  A perfect example of needing free or lowcost blogging software (Wordpress, ExpressionEngine, or whatnot).

Then the stats section.  Hasn’t b-r.com proven that you need to have all your pages linked to each other?  I should be able to get from Tim Raines to Andre Dawson with two clicks (one to get to the Expos page, the other to select another Expo).  One click would be even better.  For BJ, the stats pages are basically static Handbook pages posted online.

Here’s what Bill writes:

Here’s my idea. Each page has a “number” or “code” that will take you right to that page. The page number for my most recent article, for example, is always going to be James01001. The page number for my most recent column is always James 02001. The second most recent article is James01002. James01003 for the third most recent article. Etc.

In the Statistics area—which, by volume, is the largest area of the site and will be where many or most people spend much of their time—we assign ever player one OR MORE five-letter codes. Let’s say we have 60 profiles or 60 charts about Jim Thome. His code name happens to be his exact last name, Thome. Each chart has a code number. Jim Thome’s Hitting Profile happens to be 01. RBI Analysis is 04. The year follows the profile code. So, instead of following menus, if you want to jump straight to Jim Thome’s hitting profile for 2006, you can put Thome0106 into the site search or the Quick Jump box and go there immediately. Lance Berkman’s code is LBerk. To get his RBI Analysis for 2005, that’s LBerk0405.

The Page Codes always appear at the top right part of the page in large letters. Let’s say that the new reader wants to find Paul Konerko’s RBI profile for 2007. The new user doesn’t know how to find that, so he gets there through a series of screens/steps—

Statistics (click hyperlink)
RBI Analysis (click hyperlink)
Type Player’s name: Konerko
Select Season 2006
Click Go

Once he does that, it goes to the screen, which says on it, in large letters at the top PKone0406. The 04 is RBI Analysis and the 06 is 2006.

After a while the reader knows that Joe Crede is Crede, that Jim Thome is Thome, that David Ortiz is BPapi, that Manny Ramirez is Manny, Paul Konerko is PKone, etc., and after a while he knows that the RBI profile is number 04, the Performance as Leadoff Man profile is number 10, etc… thus he learns to navigate the site easily by typing the page number into the Quick Jump or Site Search boxes.

The pagecodes is a good idea… as a supplement.  There’s simply no way anyone is going to remember their players, and the code number for each section, etc.  Maybe if his was the only resource.  But, c’mon.  B-r.com uses easy codes too (first 5 characters of last name, 2 characters of first name, 2 digit code).  But, I would bet anything that most people will type in Raines, rather than raineti01.

In any case, mouse clicks are almost always preferred to typing letters.

***

The reader posts section, essentially a “discussion forum” like Baseball Fever is extremely bare.  Again, you can’t just say “forum”, and expect people to flock there.  If Bill James were to be engaged in each discussion, then that’ll work.  But as it stands now?  Nope.  Plus, if it ever takes off, the total inexistence of groupings of topics makes it useless.  Again, free discussion software (phpBB, etc) would have been ideal.

***

But, it’s Bill James.  It’s amazing how much I’ll let things slide for him.


#37    Tangotiger      (see all posts) 2008/02/27 (Wed) @ 15:34

Oh, and no HTML codes at all.  You can’t put paragraph marks, nothing.  Just one long paragraph.  It’s like heaven to MGL, but hell to me.


#38    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 12:12

Bill James:

I generally don’t comment on other people’s stats because

a) I don’t want to get trapped into criticizing other people’s work,

b) I don’t want to get drawn into partisan arguments, or

c) I don’t like to reveal my own ignorance.

He was still tearing Palmer/Linear Weights down in his Win Shares book, so I can’t buy a).

If you can be honest in your analysis, b) doesn’t apply.

That leaves c).  It’s weird.  I get an immense satisfaction in reading the works of others, be it MGL, Dan Fox, John Walsh, (and a long list of others).  It is a pure pleasure.  For someone like James to hold the view that he does (that he doesn’t keep up with the sabermetric world) is so very strange.  It just seems so very one-way.  I don’t get it.


#39    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:00

To put another way, would Martin Scorcese only watch his own movies, and only talk about movies that he’s done?  Doesn’t he get a thrill watching Speilberg, or Coppola, or Woody Allen, or Spike Lee (or whoever might float his boat)?  Does Denzel Washington not go watch a Tom Hanks or Merryl Street movie?  And can’t these people offer an honest assessment of their peers?

Help me out guys…


#40    dave smyth      (see all posts) 2008/03/12 (Wed) @ 20:06

Maybe James’ is simply using a ‘defense mechanism’ (from psychology), whether consciously or subconsciously. IOW, if he doesn’t read other work, he doesn’t have to deal with the possibility that he is no longer the MAN. And, if so, maybe that’s good for US, because if he thought that he was really out to pasture, his ego might result in him just going away. It’s better to have Bill James around, as long as possible…


#41    Tangotiger      (see all posts) 2008/04/28 (Mon) @ 13:10

Should I bother sending this, or will we simply accept what I tell my kid: you get what you get, and you don’t get upset.

There are some strong ways to make online discussions active.  The current setup at BJOL doesn’t use those ways.  Let me specify the issues and the resolutions.  I will be rather harsh, but constructively so.

1. The Reader Posts is basically dead.  This was easily predicted.  The setup of that page follows the form of 10 years ago: just a listing as things pop up.  A true discussion forum will *always* bubble to the top the most recent comments, so we know what is active.  As it is, there is no motivation for someone to look 15 posts down and start expanding threads to see if a new comment came along.  A phpBB.com type forum is a necessity.

Ask yourself: “What was the objective of the Reader Posts section?”.  My objective would be to engage with like-minded intelligent folks on various baseball topics.  This does not happen.

2. The comments following Bill James articles is basically dead.  Proof?  Several questions made to Bill himself are unanswered.  Why is that?  I will guess that Bill must be as tired as we are in terms of the inefficiency of culling through 50 articles to see if someone made a comment or asked a question.  Once again, any recent comment should bubble the article to the top.  Wordpress and every other blog software in existence has this feature. 

Ask yourself: “What was the objective of getting the readers to post?”.  If it was just to let them have their say and have almost no one know to read it, then fine.  This is the equivalent of leaving a customer service message on a voice mail, and expecting a call back.

3. The Poll.  I should be able to view the answers without myself answering the question.

I will stop now, as I have no idea how seriously the concerns are being taken. And if they are, if there’s any effort to do something about it.  Or if what I’m saying is being taken personally.

We’re looking for a good product, and to engage with Bill.  And this is only happening in the “Hey Bill” section, an excellent section.  Further proof that it works is that questions that should be posed following James articles are instead being posed in “Hey Bill”.  And, even if James does answer following the articles, how would we know that he answered there?

Tom


#42    MGL      (see all posts) 2008/04/28 (Mon) @ 23:53

I don’t know much about the web and blogs, but if what Tango means is that discussions are getting buried, then he is 100% correct.  If it weren’t for the section at the bottom of this blog where you can see which thread has the most recent comments (or something similar), almost every discussion would be quickly dead and buried, before its time.

For example, on BTF, once a thread leaves the “first page” it is usually dead and buried even if people want to continue.  You have to have some way for the most recent comments on threads to bubble to the top, or else interesting threads will die a quick and undeserved death.  Of course on BTF, that is not really practical because of the wide readership and the number of articles.

If that is what he means.

Yes, I would send those comments to James, maybe in a “softer” way though.


#43    Tangotiger      (see all posts) 2008/05/29 (Thu) @ 22:28

Ah, the ubiquitous Win Probability Chart.  Can no one save us from this plague?

-- Bill James

***

D’em d’ose ‘r fightin’ words, I reckon.

Unfortunately, Bill provides no additional context or conditions as to what he means by his statement.


#44    Patriot      (see all posts) 2008/05/29 (Thu) @ 23:27

James has ever been a fan of what he called “micro-level” analysis.  In the section on The Hidden Game in the original Historical Abstract, he compared using RE to evaluate strategy to flying in an ultralight, IIRC.

I can’t recall James ever using RE in any way, shape , or form, although obviously my memory could be off.  We all know for certain that he has little use for its offshoot, LW.  So I’m not surprised that he’s not embracing WE.


#45    terpsfan101      (see all posts) 2008/10/28 (Tue) @ 07:36

Once again Bill James comes up with an interesting idea, but he fails miserably in its implementation. Has anyone seen the rediculous weights he uses for the “hottest hitters.” He gives the same weight to a Sac Fly as he does a walk and ROE. Furthermore, he gives a higher weight to a fly-ball out that doesn’t include sac-flys than he does for a ground-ball out that doesn’t include double plays. Is this some cheap imitation of Tango’s Linear Weight Ratio?


#46    terpsfan101      (see all posts) 2008/10/28 (Tue) @ 07:46

Here are those insanely inaccurate weights:

Home Run - 15 points
Triple - 13
Double - 11
Single - 9
Walk - 6 (Intentional Walk the same)
Hit by Pitch - 6
Sac Fly - 6
ROE - 6
Sac Hit - 5
Fly Out - 3
Ground Out - 2
GIDP - 0
Strikeout - 0


#47    Tangotiger      (see all posts) 2008/10/28 (Tue) @ 09:20

Here are the weights that terps presented, along with the weights wOBA uses (in parens).  To put them on a somewhat similar scale, I will simply multiply the wOBA weights by 9:

Home Run - 15 (17.5)
Triple - 13 (14)
Double - 11 (11)
Single - 9 (8)
Walk - 6 (6.5)
Hit by Pitch - 6 (6.5)
Sac Fly - 6 (0)
ROE - 6 (8)
Sac Hit - 5 (3.5)
Fly Out - 3 (0)
Ground Out - 2 (0)
GIDP - 0 (0)
Strikeout - 0 (0)

Remember that wOBA is simply Linear Weights, on a rate scale, forcing the run value of an out to zero.  We see that there’s not much disagreement between the two among the major events.  But, we’ve got issues with the out.

Now, James is trying to figure “hot” hitters.  And, I suppose from that standpoint, if you are K-ing alot, you can not be hot.  (Tell that to Howard and Dunn.) I don’t necessarily agree or disagree, but the onus is on James to show us that this blatant bias he is introducing is true.  He is saying that a double+K is equal to a single+groundOut in terms of “hotness”.

I don’t know that this is true, but you have to presume it is false, until evidence is presented.  Had he given a “1” for ground out, fly out and sac fly, and brought down the sac hit to a 4 or 3, then we’d really be in a pretty good agreement.


#48    terpsfan101      (see all posts) 2008/10/28 (Tue) @ 13:56

Thanks for the explaining what James is doing here. You’ve convinced me that the weights are more reasonable than I had initially assumed. Although, the weight of the SF still makes no sense.


#49    Tangotiger      (see all posts) 2008/10/28 (Tue) @ 14:26

Sure thing.  Bill James simply dances around the entire Linear Weights thing.  He really uses it alot, and he doesn’t realize it.  His new Runs Created is Linear Weights.  Win Shares is Linear Weights. His hotness index is Linear Weights.


#50    MGL      (see all posts) 2008/10/28 (Tue) @ 22:44

I have absolutely zero interest in a “hotness stat,” thus I don’t care what weights he used for the various events.

“Hotness” can mean anything to anyone and is certainly not an issue for analysts.  So why should we worry or care what weights he uses?  There are no “correct” ones. Maybe the people that like this “stat” think that being “hot” means hitting the ball hard?  In that case, maybe a line drive hit or out gets a high weight and a weak grounder or bloop gets a low weight, hit or out.  Maybe they don’t like K’s.  Who cares what the linear weights value of a K is?

Did BJ or anyone else say that he is talking about “hotness” as it relates to context-neutral average theoretical run production?  If not, he can put in anything he wants for those weights/values.  They don’t require lwts values.

Silly stat, silly discussion.

It’s like arguing that a person who invents the “Looking good at the plate” stat is using the wrong numbers.  How the heck do you know?


#51    terpsfan101      (see all posts) 2008/10/28 (Tue) @ 23:04

Well said MGL. Can you please forward your last post to Mr. James! By the way, I’ve noticed that he still hasn’t fixed the comments section on his website. All of the reader comments are still listed in one area. His website needs a serious make-over.


#52    MGL      (see all posts) 2008/10/29 (Wed) @ 00:17

Nah.  Bill and I occasionally email one another.  It is one thing to criticize someone on a website or blog that they probably do not read and another to send them the criticism.  I have nothing against him.  I rarely go to his site.  I have little use for it.  Same for BP.  I wish them both nothing but success.


#53    terpsfan101      (see all posts) 2008/10/29 (Wed) @ 00:34

I was just joking around anyway. I should of made that clearer.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP