THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, August 14, 2008

Bayes’ Theorem

By Tangotiger, 10:13 AM

Victor looks at Bayes’ Theorem for prospect valuation.


#1    MGL      (see all posts) 2008/08/14 (Thu) @ 12:38

I wrote this in the comments section:

Nice work!

I have to think about your whole methodology for a while, but off the top of my head, one of the problems is that you would like the prior and current probabilities to be independent when doing this kind of Bayesian analysis.

The prospect rankings are clearly not independent of the player’s stats, which, as I said, is problematic to your analysis.  I don’t know how much of a problem it is and what the solution might be.


#2    dcj      (see all posts) 2008/08/14 (Thu) @ 18:55

I agree with MGL.

If you have two players with the same prospect ranking, but player A has much better stats than player B, probably the scouts prefer the tools of B. (Say equal ages and levels to keep things simple.) In Victor’s approach, you start with the ranking and then adjust by the stats, so A will end up looking better than B. Maybe that is right, but only if the rankings are biased in favor of toolsy prospects.


#3    dq      (see all posts) 2008/08/14 (Thu) @ 20:04

Shouldnt you first run them separate and see how much is prospect (tool) based and how much is stat based? I would assume that has been done before.


#4          (see all posts) 2008/08/15 (Fri) @ 11:13

dq, I’m not quite sure what you mean by “run thhem separate”


#5    Alex      (see all posts) 2008/08/17 (Sun) @ 09:32

This was really a thought provoking article.  Could an approach like this be used to determine the likelihood that a player’s dropoff in stats is due to an unannounced injury vs. plain old variance?  Or the likelihood that a young player’s statistical breakout is due to a change in his true talent level as opposed to him just having a lucky streak?


#6    MGL      (see all posts) 2008/08/17 (Sun) @ 19:43

Alex, the answer is no and yes, although if we know how often a player has an injury and how that affects his stats, then the answer is yes and yes.

BTW, I wrote this in the comments section:

Victor wrote in the article:

Litchman’s MLEs (using his linear weights) and Fox’s SFR have LaRoche as a 4.2 WAR over 1,056 plate appearances from 2005-2007. Using a .338 wOBA as major league average, Laroche’s equivalent wOBA is .349.

Coming into 2008, Laroche was rated as the No. 31 prospect overall by Baseball America, the No. 14 prospect overall by Kevin Goldstein, and the No. 22 prospect overall by Deric McKamey. Taking an average, we get Laroche as a No. 22 prospect and in the 11-25 prospect ranking group.

You have a problem in that the 2008 rankings are clearly going to based in part on the 05-07 stats.

I don’t know if that is what is happening with the entire study, but if it is, it is problematic.

If the rankings are from the 90’s and the stats are after that, there is no problem.  I don’t know why the fact that Baseball America did the rankings is relevant.  They are going to “incorporate” a player’s minor league (and amateur) stats in their rankings, just like everyone else does.


#7          (see all posts) 2008/08/18 (Mon) @ 15:12

FWIW, I ran a correlation between a minor leaguer’s MLE raw LWTS/pa and regressed LWTS/pa for all top 100 prospects who played in at least AA from 1990-1999.  The r was -.34 for for raw LWTS and -.29 for regressed LWTS.  The negative correlation means in this situation means that the higher a player’s LWTS the lower his ranking (which is good).  Of course, this result was expected.  Though I’m not sure how much we can separate this result from prospects performing well being highly rated or highly rated prospects outperforming lower rated prospects.

“If the rankings are from the 90’s and the stats are after that, there is no problem.” I’m not sure what you mean by this MGL.  A prospect groups values were determined by finding all prospects in that group and finding their production.


#8    MGL      (see all posts) 2008/08/19 (Tue) @ 01:49

Victor, I will email you when I get back in town.


#9    MGL      (see all posts) 2008/08/19 (Tue) @ 02:02

Again, from what year are the prospect rankings taken versus from what years are the MLE’s?

What good is a correlation between rankings and MLE’s if the rankings are based, at least in part, on the minor league stats (and hence the MLE’s)?

That is the point I keep making and the question I keep asking.

If a minor leaguer puts up good numbers and because of that gets a good ranking, of course there is going to be a significant correlation?  And what is that correlation going to show you?  Nothing.  However, if the rankings are taken before the MLE’s for all players then your correlations are meaningful to the extent that we can probably use one to predict the other.

The same thing is also true in your Bayesian model.  If you are using the rankings to determine the distribution of true talent for each player (which you are), but those rankings are based on the MLE’s in part, then the resultant Bayesian probabilities are almost worthless, because as I said, the two things have to be independent.  The two things I am talking about are the sample stats (the MLE’s) and whatever you use to determine the true talent distibution.

For example, let’s say that we know that an A prospect has a true talent distribution of XA, a B prospect XB, etc.  That is what you do in your methodology, which is fine.  First you determine the true talen distribution of all the various level prospects.

Now we have a player who plays at a level of L for P PA.  You go through your binomial calualations, “What is thechance that a certain true talent level player plays at exactly L for P PA?  Etc.  and you multiply or weight those numbers by the distibution of all those true talents.

Well, if the determination of that distribution was in part or in whole based on those sample numbers, which they will be if the prospect rankings are based in part or in whole on those sample numbers, then the methodolgy is severely falwed and your results are not meaningful. 

Obviosly all of the players who have good numbers will be high prospects and all of the players with bad numbers will be low prospects.  We want some low prospects to have good numbers and high prospects to have bad numbers and everything in between.

So again, if anyone is going to use this methodology, which is a great one BTW, you must make sure that the prospect rankings or whatever it is you use to determine the distribution of true talent for a player (what population he comes from) are independent (as much as possible) from the sample stats that you are using.


#10          (see all posts) 2008/08/19 (Tue) @ 11:24

MGL, the prior distribution of prospects is based on 1990-1999 prospect rankings.  I didn’t use any MLEs for this.  I just looked up all the prospects in certain groups and found their WSAB that they produced in their first 6 years in the major leagues.

In my example the sample stats I used were from Andy Laroche.  His stats were from 2005-2007 and his numbers were not used to formulate the prior distribution as he has not had enough playing time in the majors for that.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP