THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, October 06, 2009

Incorporating guts into a forecast

By Tangotiger, 09:37 AM

David says:

My expectation was that a computer would be much better at assimilating a lot of statistical information into one final prediction than the human brain, and while I still do believe that to be the case, it does appear that we humans can see something computers do not.

But then he goes on to say:

Looking at the hitters I thought would beat their projections, I saw a lot of special skills, most of them young, but all very talented.... The hitters I thought our projections overrated were mostly some combination of old, fat and strikeout-prone. ... The only thing that really jumps out at me is that I liked a lot of high-strikeout guys, while a lot of the pitchers I didn’t like are below-average at whiffing hitters.

The “computer” that David references is the algorithm he designed to create the forecasts, and the computer simply speeded up the process.  It’s the only thing the computer did.  Speed.  The algorithm was designed by a human.  Furthermore, that human chose to ignore the impact that would have helped his algorithm.  So, he “knew” (or suspected anyway) that high-K pitchers have an extra oomph (something more real, or a better ERA to regress toward).  But, he didn’t put that in his algorithm.  This is the kind of thing that PECOTA would implicitly accept.  For example, it would look at the high-K pitchers, find the comparable pitchers, and use that as an extra regression point.

Anyway, all David has to do is create additional parameters for his algorithm.  He can set a “1” for anyone that satisfies his baseball guts for improvement, he can set a “1” for anyone that doesn’t.  That gives us an extra parameter for the regression equation.  If his baseball guts are worth 50 points of OPS and 0.50 ERA, then he can include that in his equation.  Basically, if he has a reason to suspect that a player’s 2008 or 2007 stats are not representative of that player, he can fudge that data by introducing a Guts parameter.

I think MGL has said that he manually makes park factor changes, as he thinks appropriate.  It’s the same deal here. 

Kudos for David to showing that he’s got baseball guts.  Now, just include that in his algorithm, so that next year, he can’t beat his own algorithm.


#1    Guy      (see all posts) 2009/10/06 (Tue) @ 10:27

I think it’s useful to distinguish between finding systemic improvements to his model (or uncovering systemic flaws in his model—same thing), that apply across players with similar statistics, vs. situations where his “gut” or direct observation of a player tells him something that the statistics cannot.  Looking at his successful calls, you might want to explore theories like:
1) very young players given lots of playing time (e.g. Upton) should be regressed to a higher mean;
2) players in their mid-30s need to be projected lower (aging curve is off)
3) for dramatic changes in performance after age 32, give a lot of weight to declines (Ortiz) but less weight to improvements (Chipper);
4) give more weight to performance more than 3 years prior (Chipper, Ichiro);
5) lower projections for players with very low BA relative to OPS (Cust)
6) include weight or body mass as predictor (Ortiz, Howard)
7) whatever else you do, don’t forecast Ichiro to be 70 points below his career OPS, and don’t project Danny Haren to have a 4.22 ERA (seriously, how hard was it to call these players right?).
I’m not saying any of these are right, and maybe none are.  But testing them might allow for improvements in the model.

Then, there may still be things that can be seen by an observer but are not (yet) captured by the data.  But I’d put that in a separate category.


#2    Peter Jensen      (see all posts) 2009/10/06 (Tue) @ 11:00

I think it was a positive first step for David to question his own algorithm and publish his gut choices.  His instincts as to where his program was weak seem to have been proven correct by the results.  But since he didn’t actually make a gut projection of what the 29 players’ actual stats were going to be, there is no actual evidence that his gut projections were any more accurate than his algorithm projections.  Yes, he correctly projected which players would do better and which would do worse, but it is possible that his gut projections would have overshot the actual stats by a greater amount than the algorithm missed them.  He also cherry picked the 29 projections that he thought were the most likely to be wrong.  If he were to do what Tango is suggesting to improve his program by adding his gut as a factor he has no information to help him decide where to draw the line.  Will he alter 29 projections? Or 15? Or 100?  He needs to do what Guy suggests; use these results as a guideline for adjusting the original factors in his projection algorithm to improve the overall accuracy.


#3    Tangotiger      (see all posts) 2009/10/06 (Tue) @ 11:23

Agreed on all counts.

As an example, when Marcel would do the forecast, it simply keeps the weight as 5/4/3 for the last 3 seasons.  But, perhaps the weighting should be 6/4/2 for guys over 35.  And maybe it should be 7/4/1 for guys over 35 whose performance in the last year dropped by at least 30%.

Your baseball guts will at least point you for an area of research to study.

And while Marcel+guts might suggest 6/4/2 for guys over 35, it might suggest 7/4/1 for the HR component of guys over 35, and 5/4/3 for guys over 35 for the BABIP component, etc, etc.

All-in-all, if your standard forecasting system has a .510 record against Marcel (and I’m not sure it’s even that high), a super-duper-complex one might bump it up to a .520 record.  That’s not to say that we shouldn’t try, as we may unearth nuggets.  But, the end-result is severely constrained as to how far you can go.


#4    David Gassko      (see all posts) 2009/10/06 (Tue) @ 11:27

Guy,

THT thought Ichiro would post a .738 OPS. ZiPS said .737. CHONE said .745. Even Marcel said .758. The Haren projection was definitely an outlier, and I don’t really know why. It wasn’t very good.

The rest of your suggestions are very good. It’s possible that all of my predictions would have been made by a better projection model, but I don’t know that there isn’t some role for “baseball guts” that can’t be programmed into a projection system, beyond Tango’s suggestion of course.


#5    David Gassko      (see all posts) 2009/10/06 (Tue) @ 11:28

And just to answer Tango’s post, we do different weightings for each component by age. So that’s definitely not enough!


#6    Tangotiger      (see all posts) 2009/10/06 (Tue) @ 11:46

David/5: right, exactly.  The component weighting by age might be too moderate. 

The interesting question is that if you overcompensate in that regard (say you decide that HR/PA needs to be overweighted by recent seasons for guys over 35, who dropped their HR/PA rate by at least 30%), then how does that impact all such players?  It’s one thing to pick out Ortiz, but what about other players who were in a similar boat, but that David didn’t think to classify similarly with his baseball guts?

So, yes, he needs to build a better model to minimize what his baseball guts sees (in the numbers).

But, he can certainly go above-and-beyond and tells us he “sees” something (Scouting-wise) but BEYOND the numbers.  If he bases his baseball guts on the numbers, then the correct thing to do is incorportate it in his algorithm.

If he’s going beyond the numbers, then that’s a scouting number he has to tag to each player.

As an example: I don’t care *at all* what UZR says about Andrus (Rangers) or Escobar (Brewers).  At all.  Well, maybe 10% or something.  These guys were highly thought of in the minor leagues for their fielding tools, they were brought up in trusting circumstances (moved Young to 3B, or shipped Hardy out altogether) to make room for their gloves.  AND the Fans’ Scouting Report think of them as already gold-glove caliber players.  Given all that, it doesn’t matter what UZR thinks of them.  I’d already peg them for +15 runs per 150 G players.  Their UZR might bump them up to +16 or +17, or maybe drop them down to +13 or something.  But, that’s it.

So, if this is what David is talking about, when using his baseball guts, then good.  Tag those players as such.

But, if he’s saying that there’s a certain combination of observable performance numbers that leads to some new insight, then that has to be built into the algorithm.


#7    Tangotiger      (see all posts) 2009/10/06 (Tue) @ 11:57

I was curious, so I checked them out.  Andrus is +10 runs in 181 “equivalent games”.  Escobar is 0 runs in 41 games.  Based on this, I might give Andrus +14 and Escobar +13 or something.

For any player, the more they play, the less the Scouting Report applies, and the more actual performance does (unless we suspect a change in talent level).


#8    Guy      (see all posts) 2009/10/06 (Tue) @ 12:13

David:  I was too glib re: Ichiro.  It’s interesting that his projections were so low.  I imagine that’s a function of his age?  Perhaps speed indicators (43 SB/47 SBA last year) would tell us a player’s age curve needs to be flatter than usual.  Or maybe that’s a case where it really is just your gut.

Great job.  Will you try again next year, or retire and “go out on top?” :>)


#9    StevenEll      (see all posts) 2009/10/06 (Tue) @ 12:31

Guy, I have a feeling that Ichiro is just going to be one of those guys who you will have to use your gut for.  The speed score thing is interesting though.  Maybe not necessarily that he has such a high speed score, but that it hasn’t dropped as he’s aged.  UZR might help in that regard also.


#10    Nick      (see all posts) 2009/10/06 (Tue) @ 21:49

Projections always underrate Ichiro.  You cheated David!

Anyway, the reservation I had with this piece is that it is a very small sample size of projections.  It’s entirely possible that your guts could suck, and you simply got lucky on those 29 players. 

What would be interesting would be to take a lot of qualified analysts and assign them a set of 30 different projections and have them do the same excersize that you did.  Even better, have them project a specific OPS so we could use RSME.


#11          (see all posts) 2009/10/06 (Tue) @ 22:27

I think that David’s feat is quite impressive and I have little doubt that there is a high degree of probability that the data he presents evinces a skill on his part.

The discussion on BBTF is fairly interesting. I am pretty sure that looking at the number of correct and incorrect without regard to the magnitude is NOT the best way to evaluate his skill, although it is ONE way (but not a particularly good one).  Whether to weight by PA (or TBF) or give each player equal weight, I am not sure.  And whether to “cap” the weights, as Tango suggests, I am not sure either.

I think you would have to reverse engineer the process to see which way to evaluate yields the fewest mistakes.  That is complicated.  You would have to assume that he is correct in a majority of his players and then see which way of evaluating gives you the fewest false positives and false negatives.  Then do the same thing if he is incorrect with his assessment of the players in question.  You would have to simulate a few thousand seasons making certain assumptions.

Saying that he “cherry picked” the players, and therefore the exercise in not legitimate, is almost certainly not correct.  Of course he “cherry picked” the players.  That is the point. He doesn’t claim that he can do the same thing with all players.  He only claims that he can look at the entire list and identify players that he knows something about, either consciously or subconsciously, such that he can identify whether he thinks the projection is a good or bad one.  Nothing wrong with that hypothesis.  It could be 50 players or it could be 10 players.  That is his choice.  Obviously the more players, the larger the sample size, and the more the results are likely to be significant.

The only thing that the “study” is lacking, which it would be nice to know (the data is there for someone to compute it though), is the significance of his results - IOW, the chances that he got lucky and that he had no skill at all.

One more thing. If someone wrote an article linked to on BBTF, proclaiming that “The sky is blue,” would Emeigh, Treder, Davis, Dial, et al. post 100 times trying to refute that assertion?  Another reason I don’t participate in that site.


#12    Nick      (see all posts) 2009/10/06 (Tue) @ 22:57

The only thing that the “study” is lacking, which it would be nice to know (the data is there for someone to compute it though), is the significance of his results - IOW, the chances that he got lucky and that he had no skill at all.

Yes, that’s what I was really trying to say.  I have little doubt that David’s gut is significantly better than average; however, we can’t know how good it is or how much of a difference it makes without proper testing.  To do that properly, we would need to figure out the expected variance of human forecasts; hence my proposed experiment.


#13    Peter Jensen      (see all posts) 2009/10/07 (Wed) @ 09:51

MGL - Saying that he “cherry picked” the players, and therefore the exercise in not legitimate, is almost certainly not correct.

When I said that David “cherry picked” his sample I was not trying to diminish David’s accomplishment and I certainly did not say or imply that “therefore the exercise is not legitimate.” I was commenting on Tango’s suggested method of incorporating David’s gut instincts into his projection algorithm.  Which is clear if you read the sentence that follows my cherry picking comment.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 04:52
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential