THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, February 25, 2010

Prospect Lists

By Tangotiger, 09:11 PM

Good job by Bryan.  I agree, and this would apply to MLB players as well.  The most valuable property is Evan Longoria properties.  I don’t think anyone can even be close. 

So what about prospects?  Again, we only care about those years of team control.  So, the 6-7 years that a team gets a player, that’s what you care about.  Add up all the potential WAR while pre-free agency, and that’s how you rank your prospects.


#1          (see all posts) 2010/02/26 (Fri) @ 10:47

The big problem with the methodology seems to be factoring in risk.  It might be worthwhile to try to compile a Fans Prospect Report.  Fans would be asked to assess bust and injury risk (and breakout, etc) for whatever prospects they wish to evaluate (you could just throw the BA top 30 names up there to cover everyone).  Such a report would undoubtedly be very optimistic, but 5 years from now, we should start getting enough data to compare predicted bust/injury risk to actual rates.  Then from there it’s a matter of adjusting.  It’s not a perfect way of going about solving the issue, but it’s a nice starting place.

Fangraphs would obviously be the place to do this.

At the very least, it’s a fun experiment.


#2    philly      (see all posts) 2010/02/26 (Fri) @ 11:28

I think that’s a great idea.  I’ve been trying to get something like that up and running at SoSH with only moderate success.  Here’s what I’ve done so far:

http://sonsofsamhorn.net/index.php?showtopic=54086

I think it would be really interesting and if you could reach out to the prospect hound web sites I think you could make it work.  Somebody try convincing Tango…


#3          (see all posts) 2010/02/26 (Fri) @ 12:17

My one problem with the idea of it all is that assessing “risk factor” or “bust potential” or whatever we want to call it would be really hard for fans. In my dealings with prospect hounds for certain teams, I think fans have a good idea what kind of player a guy could become, but an unrealistic sense that he actually will become that.

I’d love to do risk factor numerically, but as mentioned in the FG comments, I too wonder if just creating a few different buckets to group prospects into isn’t a nice short-hand way of doing it.


#4    tangotiger      (see all posts) 2010/02/26 (Fri) @ 12:35

philly, if you like, I can post my email to you and we can have a conversation on it.  It might be good.


#5    philly      (see all posts) 2010/02/26 (Fri) @ 13:20

Sure.

I made a suggestion to Tango that this type of prospect report would potentially yield a lot of interesting data and would naturally fit into his wisdom of the crowd approach to sabremetrics.

Tango’s response:

philly,

Actually, I have thought about that for Mariners purposes.  The Mariners
blogosphere is filled with bloggers and commenters, on par, or bigger,
than Redsox Nation.  I figured I could leverage them.

The key though is that it has to be based on NON-NUMBERS.  To leverage
fans, I need them to give me what I don’t already have.  I don’t want them
to interpret someone’s SLG or K/BB ratio, etc.  I want them to tell me
what they think.  So, I don’t like that you listed data for players.  Only
their name.

Anyway, expanding this for every team is no big deal.  There are two issues:
1. Selecting the list of players for each team
2. Getting enough people to participate

I ran it for college for 2 years:
http://tangotiger.net/college/

And the participation was very low.  So, this can work for M’s, Redsox,
maybe Mets/Yanks.  That’s pretty much it.

Tom

My attempt to convince:

I do remember the college project.  I’m not a fan of college baseball so I hadn’t followed it though.  I didn’t realize that it didn’t draw many people.  I’m not sure a lack of interest in college players would necessarily doom a minor league project.  I agree that Ms, Sox and the NY teams would be the easiest to drum up traffic.  But from time to time I’ve stumbled across sites dedicated to the farms of other teams.  I think the pull of a “Tango project” would be able to reach some of those sites.

I also think you’re missing the target audience a bit by thinking in terms of team specific presence on the web.  I see prospect sites like John Sickels’ to be one of the primary places to draw from.  I think there are a lot of posters on sites like that who would end up rating many prospects from all teams.  I’m in a deep keeper strat league and there are plenty of owners who are sort of interested in their favorite team’s prospects, but intensely interested in their own strat team prospects.  I think there are a lot of fantasy players who feel like they are experts on their prospects.  You’ve said that you need ~15 votes to get good data.  If we can tap into those fans I think we can get 15 votes for a pretty broad range of prospects, not just those from the teams with big web communities.

Since Victor used Sickels grades for part of his work and Sickels publicly publishes team top 20s, I would think that would be a pretty good place to start.  It would help integrate these results into that framework and 20 prospects per team is enough that you can be pretty sure you’re capturing most future good players and you’re not overwhelming the voters.

That would be a potential pool of 600 prospects.  I’d love to have enough votes for all of them to provide reasonable data.  But it may be that only 150-200 players generate a lot of votes.  Those might be the players from the web popular teams and also the usual suspect candidates for most top 100 lists.  Even at that level I think there would be useful data.  I’d like to see a comparison of top 100 lists from BA, BP, et al with their straight ranking approach to a list generated by this approach.  You may be right that we won’t be able to get 15 votes for the 11th to 20th best prospects in the Nationals farm system.  I guarantee we’d get that many for Strasburg and Derek Norris.  Everybody who plays in a fantasy league with prospects will know and have an opinion on them.

philly

I wouldn’t guarantee that great data would come from the project.  I have concerns about how well people will think about risk - although I feel that could possibly be adjusted on the back end in the way Fan Projections often are. 

I do think it’s an opportunity that could help us learn about prospect valuations and development.  Given the tremendous increase in attention for prospects both from fans and the teams who now hold onto them for dear (financial) life, I think we should try to grap every opportunity to help us think about these players.


#6    Kent Bonham      (see all posts) 2010/02/26 (Fri) @ 13:20

If this is the place to go to help something like that pick up steam, count me in. It would be great to see this take off.

I say this without even knowing what “it” is yet, but Bryan+tango+philly is good enough for me!


#7          (see all posts) 2010/02/26 (Fri) @ 14:02

I will say this: when I joined FanGraphs, there was discussion about creating a community ... something. We just didn’t know what would work best, or be useful. I’d LOVE to be part of a conversation trying to find what is most useful, and then I’d be happy to go to the FanGraphs hierarchy and present our ideas.


#8          (see all posts) 2010/02/26 (Fri) @ 14:10

What about creating a report that would have players listed, which fans would respond to this:

STARLIN CASTRO, SS, CUBS

MEDIAN K% FOR TEAM-CONTROLLED YEARS ___
MEDIAN BB% FOR TEAM-CONTROLLED YEARS __
MEDIAN XBH% (OR ISO?) FOR TEAM-CONTROLLED YEARS __
MEDIAN BABIP FOR TEAM-CONTROLLED YEARS ___
MEDIAN POSITION ADJUSTMENT + UZR FOR TEAM-CONTROLLED YEARS __

From there, we have a program that does what I did with Castro, and computes WAR.

Then, a few other questions I can see:

-- MAX POTENTIAL WAR IN FIRST 7 YEARS
-- % LIKELIHOOD BECOMES EVERYDAY PLAYER


#9    tangotiger      (see all posts) 2010/02/26 (Fri) @ 14:57

To me, the #1 thing to do is what position a player will most likely play.  It makes a huge difference is you have someone who is a SS in the minors, but really should be in LF.  Enormous.  Or 3B instead of RF, etc.  THAT’s the value of the fans.

His K rate, BB rate, etc.  I dunno.  Isn’t he going to rely on numbers?

I need him to tell me what the numbers don’t tell me.


#10    philly      (see all posts) 2010/02/26 (Fri) @ 15:10

Position is an interesting one to add in.

I’m concerned that asking for too much specific statistical information would be a barrier to entry for a lot of potential voters.  It’s one thing to drill down that deeply into a top prospect like Starlin Castro.  It’s quite another for a solid grade B prospect who’s #175 or something. 

I’ve tried to take the broad range of probabilities to come up with some expected WAR totals that are actually scaled to the pre-arb years.  Simply asking MAX POTENTIAL WAR is maybe a simpler way of doing so, but it has to be with odds for that level of success tied right into it.  YOu don’t want every high ceiling top prospect with a MAX POTENTIAL WAR of 30-36 or whatever.


#11    Jamie      (see all posts) 2010/02/26 (Fri) @ 15:41

Tango:

it sounds like you’re asking for us to all be scout evaluators if we can’t use numbers.  but the thing is that hardly any of us that are interested in these prospects will ever get a chance to see them and watch them play.  there MIGHT be 10 people for each team that could do that.  there just wouldn’t be enough sample size for this to work without using some sort of statistics.


#12          (see all posts) 2010/02/26 (Fri) @ 15:44

Tango/9: But in a community prospect list, you’re simply going to have people tell you things from the 2-3 sources they read. If we’re talking about a community scouting report, that’s one thing. I don’t think you’d have enough people.

Agreed about position being extremely important. Although I have a study I want to do about the readiness of corner players (old player skills) vs. the positional advantage of up-the-middle players. We’ll see…

Philly/10: I can see that.


#13    tangotiger      (see all posts) 2010/02/26 (Fri) @ 18:29

Let me ask then: on what basis will people be putting in their evaluations?

Are you asking them to interpret the same numbers I see?  If so, then it’s a non-starter.  Why would you need them to interpret the same numbers?

Are you asking them to collate all the prospect lists they see?  Again, what’s the point there?

Are you asking them to tell me what they read on the internet?  That’s a bit better, but only if it’s about specific traits of the player.

Are you asking them to tell me what they actually see?  Now there’s the value right there.

And if I only wanted to know one thing from the community, it’s this: what position do you see him play at age 25-27?  That is it.  That’s all I want.

Basically, what is the value-added of the fan?

With the scouting reports, it’s them telling me, in his rookie year, that Ryan Zimmerman is a fielding standout.  How long does it take Dewan or MGL or Pinto to have enough data to make that claim?

When will the data tell you enough about Elvis Andrus?


#14    philly      (see all posts) 2010/02/26 (Fri) @ 22:27

Let me take a shot at some of these.

Let me ask then: on what basis will people be putting in their evaluations?

This is going to come off too cute, but literally everything they know about the specific prospect in question but also baseball in general.  One of the things that has been true and really gratifying at SoSH is that some of the voters are finding that just the process of thinking thru the probabilities is forcing them to think about prospects in different ways.  The old ways are ok, but they leave plenty of room for improvement.

People who will be motivated to vote will do so with some knowledge of the numbers, the various scouting reports available and their own understanding of historical risks of prospects.  A small number of people will have actually seen them play.  Hard core prospect hounds will have sought out videos on many of the top prospects.  What’s novel is how these many individuals synthesis that information when they’re specicifically asked to think about risk and not just ceiling.

Every prospect ranker will, at some point, say this his rankings are the result of a judgement of a prospect’s ceiling and his chance to reach that ceiling.  They all say that, but I don’t see much evidence that any of them try to combine those two factors in any kind of a systematic way.  You seem to be fixated on what novel information the voters can bring to the table.  I think the process we set up is the novel thing and if done right it will use the knowledge of the voters to create a novel and interesting data set.

Are you asking them to interpret the same numbers I see?  If so, then it’s a non-starter.  Why would you need them to interpret the same numbers?

I basically agree with one caveat.  We’re going to be asking about a lot of prospects with very little pro track record and/or numbers at they very lowest levels of the minors.  The numbers of prospects in short season ball and even lo-A need something to help with the interpretation.  The best thing is top notch scouting.  If we’re lucky this might be a helpful thing for players at those levels.

Are you asking them to collate all the prospect lists they see?  Again, what’s the point there?

No.  For me anyway, this largely has grown out of frustration with list collations and just randomly asking people for their team top 10s to make consensus lists with no new information.

Are you asking them to tell me what they read on the internet?  That’s a bit better, but only if it’s about specific traits of the player.

I think I see where you going with this - something along the lines of how you break down fielding into components?  I don’t think there’s enough detail in the scouting reports we read online to do that.  But what people read - how they interpret those reports - will be an important factor in how people vote.  For example, it’s widely reported that Donovon Tate has incredible tools and serious questions about how well he can actually hit.  Every ranker except Keith Law has focused on the tools and made Tate a top 35 prospect.  I suspect many people will interpret that scouting report with much more weight on the questionable hit tool and use that to boost his risk profile.  If a lot of people do that, he won’t be able to rank very highly.  Whether that community embrace of risk is better than what the rankers are using I don’t know.  But it gives us something to compare that might reveal interesting differences.

Are you asking them to tell me what they actually see?  Now there’s the value right there.

Yeah, I’d love to, but the votes aren’t there.

And if I only wanted to know one thing from the community, it’s this: what position do you see him play at age 25-27?  That is it.  That’s all I want.

As you recall when I pitched this to you, one of the things that I stressed is that your upfront involvement would let you help shape the process.  You want that?  Do it, it’s yours.  But do the other part as well just in case there’s something interesting that will come from that, even if you don’t see it right now.  wink

Basically, what is the value-added of the fan?

It’s definitely not as clear cut as your example of the fan scouting report.  I think we’d have to go through the process and see how it differs from what currently exists.  I’m pretty sure it will.  And then we’ll have to see if there’s value-added in those differences.  Dealing with these young prospects, it’s going to take time. 

You’ve expressed interest and excitement in some of the ways that Victor Wang’s retrospective probabilities have been used to do things like put valuations on farm systems.  You linked here to someone who had done that with Sickels’ Top 20s.  That was interesting to look at, but we won’t really know how good those rankings were for many years.  That’s a big issue with anything involving prospects.  If done well, this process of creating prospective probabilities good generate better versions of those kinds of lists.  Will we know that at the end of the 2010 season? Nope, but I’d be pretty surprised if we didn’t learn enough to feel like the effort was worthwhile and we’d all be willing to tee it up again for 2011.

When will the data tell you enough about Elvis Andrus?

It’ll be a while before you can be confident about the data about a single player.  Honestly, it will take years to really validate that.  But I think the data in the aggregate might tell us useful things very quickly.

I’m starting to feel a little Ahab on this…


#15    Tangotiger      (see all posts) 2010/02/26 (Fri) @ 23:15

I was thinking about this while watching the first period.

How about we follow the basic scouting guidelines of 20-80, with 50 being an average MLBer, and we focus on these traits:

Power
Strike Zone Judgement
Speed
Fielding - Range/Catching
Fielding - Throwing
Position

So, I’d rather they tell me something along those lines, and then I/WE will convert this into a batting and fielding line, either with our best guess, or wait a year or three and run a regression.


#16    erik      (see all posts) 2010/02/28 (Sun) @ 21:21

@Tango

I’m a little late to the party, but I love the idea of a community scouting report using the 20-80 scale, preferably with FanGraphs where the audience is large/diverse. I think as with the fan projections, people should identify their rooting interests so we can taper down homer-ism, if possible.

I would say convert the batting and fielding lines with our best guess, then do the necessary regressions three years later. Fudged lines would probably fine for now. Hitting has pretty much already been laid out and are common knowledge, and I think we can all come to an agreeable consensus as to what makes up the other grades rather easily.

This would create a wealth of information for fantasy players in keeper leagues, and just fans of teams alike.


#17    Tangotiger      (see all posts) 2010/02/28 (Sun) @ 22:01

Philly,

if you want, post a link to SOSH where you discussed what you wanted to do, and let’s see if we can at least come to some common ground.


#18    dkappelman      (see all posts) 2010/02/28 (Sun) @ 22:49

Doing a 20-80 scale for 6 fields would be very easy to do from a coding standpoint (the code is basically already there from the fans projections) and is something that could be run all year long.

Marc Hulet and I (and Bryan too) have talked a little about doing some sort of community prospect thing and I know Marc has given it a decent amount of thought so I’ll chime him into this thread and maybe we can churn something out before the minor league season starts.


#19    Tangotiger      (see all posts) 2010/02/28 (Sun) @ 22:53

I’d like to hear what Marc’s been considering…


#20          (see all posts) 2010/03/01 (Mon) @ 00:08

Well I’m definitely glad that other people are trying to come up with different projects.  As much as I like reading prospect reports, I can’t help but think the current paradigm is a little bit played out.  I’m a big beleiver in the online community trying to come up with something new that will complement what already exists in so many places.

My concern about following basic scouting guidelines is that it implies that it would be a community scouting project similar to teh Fan Scouting Report.  But it clearly wouldn’t be that.  It would be re-interpreting public reports and stats in a way that mimics scouting reports. 

I think that could generate interesting data, but I think it would have to be carried out a few years before we really understood what that data was telling us.  And the piece of the puzzle that I really want to try to get at is projecting risk with success probability bands. 

Here is the intro to the project I’ve tried to start at SoSH and talks about some of these issues:

http://sonsofsamhorn.net/index.php?showtopic=53323

Here is a preliminary look at the data that has been generated so far with a modest number of voters:

http://sonsofsamhorn.net/index.php?showtopic=54086

One of the original inspirations for me is the Shandler scouting book originally written by Derrick McKamey.  If anybody is familiar with that work, they use a simple two grade system with a number grade for ceiling and a letter grade for the probability to reach it.  I’ve always thought that really stood out in comparison to what other people who rate prospects.

But I’ve just realized an even better model for what I’d like to do.  BP seems to have royally screwed up their PECOTA roll out, but I’ve always loved they way they presented some of that data on the PECOTA cards themselves.  In particular the Stars and Scrubs chart.  Now I’ve never seen them validate how well thier multiyear modeling is, but the charts look fantastic.  They have 6 categories from Superstar to Drop and assign a probability for the next several years of the prospects career in each of those 6 categories. 

I’d love to come up with a community way to generate that kind of prospective success probabalities.  Done well, that would really complement the standard way rankings are done.  And unlike BP we’d test to see if these multiyear projections actually make sense.

Naturally, I think something pretty close to my idea is best, but I’m board with building momentum for something new.


#21          (see all posts) 2010/03/01 (Mon) @ 10:56

Glad to see this thread come alive. Philly: I love that you’re doing it on the organizational level. I think that could pose some really interesting findings.

For what might be done on FanGraphs—well, I shouldn’t say that—for what I would suggest, I do somewhat agree when Tango said this:

And if I only wanted to know one thing from the community, it’s this: what position do you see him play at age 25-27?  That is it.  That’s all I want.

I would say that I want to know his position for his team-controlled seasons ... all six of them. Again, the peak isn’t very important to me. If Justin Upton peaks from age 25-27, what does this mean for the Diamondbacks??

I would add one more, which would be power potential. I think the power development of minor leaguers is not nearly as analogous as their development in other statistics: patience, contact, even defense. Numbers just can’t tell the player that hits 40 doubles that will see 15 of them fly over the wall next year, and 40 doubles that will always stay inside the park. Not yet, at least.

I think there’s an argument to be made for doing what Erik did on Twitter—asking for a quick 20-80 scale on defense, and in this case, I say why not?

And one more, which sort of mocks what Philly suggests: how about a confidence grade that the player will have six full Major League seasons?

So here’s what I propose we ask the “Fans”:

-- Likely position for team-controlled years
-- Likely power development during those yrs
-- 20-80 grade on defense at that position
-- A-F confidence grade the player will have SIX FULL seasons


#22          (see all posts) 2010/03/01 (Mon) @ 11:09

I also want to note that the last bullet point there (the confidence grade), is something I hope to do a study on this season. Hopefully, the study will pose some interesting results that will help change the way voters think about the likelihood of X prospect’s success.

Or maybe it won’t.


#23    Tangotiger      (see all posts) 2010/03/01 (Mon) @ 11:16

I like Bryan’s suggestions alot.

I think “Speed” is an easy one to add. 

The only other one I mentioned that’s not on his list is StrikeZone Judgement.

That’s for position players.

What about pitchers?


#24          (see all posts) 2010/03/01 (Mon) @ 11:22

Well, just like position for hitters, you’d want some degree of recognition on whether the pitcher was bound to end up a starter or reliever.

Do you want to go to the fans in guessing the likelihood of pitcher injury? I don’t know…

Likely ultimate velocity?

20-80 grade on “out pitch”?

Trying to think of things I can’t learn from a box score.


#25    Marc Hulet      (see all posts) 2010/03/01 (Mon) @ 17:52

Here was my basic initial idea to David A on a possible community-scouting site…

I could see there being a main landing page with perhaps a Top 25 prospects watch list and then 30 team pages for each organization with an extended list of prospects. At the end of each season I could do a Top 10 list of pitching prospects for each team with scouting reports based on what our readers/amateur scouts have noticed throughout the year… it would be interesting to see if a collection of amateur scouting reports (balanced out by our stats analysis) can hold up to (other established sites).

Perhaps we can devise a standard scouting document for people to print off and take with them to games (or for use while watching MiLB.com), with categories to rate such as ground-ball tendencies, first-pitch strike tendencies, control, strikeout ability, strikeout pitch, velocity, movement… and instead of using the 20-80 scout scale, maybe we can dumb it down to an A (70-80), B (50-60), C (30-40), and D (20 and below) report card.

With each player on the site, we can have a base profile, including their repertoire, draft background, college info, etc. as well as stats broken down in a game-by-game log with submitted scouting reports linked to each game. People could submit scouting reports initially to you and I, we could edit them and then post them.

Each day, we could highlight 5 top pitchers to watch that night, while also keeping a running tally of the Top 25 pitchers on the main page. Eventually it might be nice to add some video and perhaps some interviews with players, coaches and even pro scouts.

...it was a pretty bare-bones idea that you guys have already built upon nicely.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 10:35
Rooting for laundry

May 25 10:14
Largest demonstration in Canadian history?

May 25 09:39
What sabermetrics is NOT

May 25 09:31
Do pitcher’s reach back for velocity when needed?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story