THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, April 16, 2009

The Race Database

By Tangotiger, 11:00 AM

This post is only for those who sees this from a clinical rather than moral perspective.  Please ignore this post if you are in the latter group.

Can someone send me email addresses or post links to various race- or ethnic-based research, where the researcher compiled a database denoting the players by ethnicity, race, or skin color?

My plan is to do the following: 


Use the Wisdom of the Crowds to determine the race-ethnicity-skin color of every player in MLB (active, and perhaps going back in time to forever).

I am opening up the floor here to hear from your perspectives as to the best way to treat this so that it is not insulting to any one group of people.  (If everyone is insulted, that’s fine.  Worked wonders for Don Rickles.) For example, I was thinking of having a “skin” color in two- or three-dimensions, where you’d have pure white on one end, pure black on another, and, I’m not sure of the third color in the third point. 

I’m not sure what race and ethnicity to put, other than what I find from the OMB.

If you are wondering about objectives for this Race Database, here are a few:
1. Do pitchers hit more or fewer batters of the same “x”, where x could be skin color, race, ethnicity, birthplace, size, or whatnot
2. Is there a “x” change over time, and what is it?

A guy like ARod for example would test the limits to where he would fit in.  But, the idea is that it’s all about perception.  I don’t really care how black or white or hispanic he is.  What matters is how does the baseball world perceive him.  Perception is reality.

Anyway, it’s a snap for me to create these Wisdom of Crowd surveys.  The key however is that I get an intelligent crowd, and a small but sizeable crowd.  Five votes per player is good enough for these purposes.  But, this project could get derailed by a--holes, and by those who will express moral outrage at anything that is race-based.

Anything you can suggest that limits exposure to a--holes, attracts a decent crowd to the project, and no particular subgroup is offended is welcome.

#1    Colin Wyers      (see all posts) 2009/04/16 (Thu) @ 11:31

I’d suggest going with the official US Census categories of ethnicity:

-- American Indian or Alaska Native

-- Asian

-- Black or African American

-- Hispanic or Latino

-- Native Hawaiian or Other Pacific Islander

-- White

As to avoid the problem of having to proffer your own definitions of what ethnicities there are.

I think this is a fantastic idea, and would love to see it go back at least 5 years or so. (Myself, I’d love to hook this up to my database of free agent salaries.)


#2          (see all posts) 2009/04/16 (Thu) @ 11:40

Send an email to those that have completed your other surveys, and ask for them to do this on an invite-only basis.  Personally, spending a few minutes completing a survey for you would be the least that I can do in exchange for all of the content that you have provided for me the reader.


#3    Rally      (see all posts) 2009/04/16 (Thu) @ 11:44

I’ve already coded every player before 1947.

Moses Fleetwood Walker - black
All others - white


#4          (see all posts) 2009/04/16 (Thu) @ 11:48

From Colin’s list above, I think that’s a checkbox system.  Which has some pluses, but I’d think if you coded in a way to assign percentages, such that they all add up to 100, it may work better.

The other issue, in my opinion, is that the US Census list doesn’t get at skin tone at all.  Which is fine for it’s purpose, but my guess is that skin tone plays a large role in the bias you may find.  In other words, people may treat Jose Reyes and Miguel Cabrera differently due to their different skin tone, even though they may both get the same box checked off from the list above.


#5    Guy      (see all posts) 2009/04/16 (Thu) @ 11:51

I think you should find a way to measure race and ethnicity separately.  Depending on the issue you are studying, it may be important to distinguish between Hispanics who are of African descent and those who are not (and of course race is not truly an either/or proposition).  It would be a mistake to assume prior to a study that a light-skinned Cuban and a “Black” Dominican both belong in the same “Hispanic” category.


#6    Patriot      (see all posts) 2009/04/16 (Thu) @ 12:23

Rally, you forgot Welday Walker and his 5 games played grin

I like the idea of a separate field for race and ethnicity.  Especially in early baseball, it would be interesting to look at the ebb and flow of various “white” ethnicities (lots of Irish players in the late 1900s.  Italians becoming prominent in the 1920s and growing for a couple decades, as examples).  The rise and fall (at least that’s my perception, which could well be wrong) of Jewish ballplayers.  Etc.


#7          (see all posts) 2009/04/16 (Thu) @ 12:28

how about by socio economic backgrounds?  poor/inner city v poor/rural v rich/suburban etc.  youd have to go by country or at least region or else make adjustments for relative wealth, but that will probably reveal more than by seperating players by their tanning ability.


#8    dan      (see all posts) 2009/04/16 (Thu) @ 12:50

Ken-- How would you do that? Not my town, but the town next to where I live has some families living in trailer parks and others living in big houses, driving BMWs. They all live in the same town, but have very different economic backgrounds. I’m just asking how you would find out the socio economic background of every player.


#9          (see all posts) 2009/04/16 (Thu) @ 13:03

dan - great question - no idea!  probabaly a lot of painstaking research that would not be feasible.

if its just skin tone and ethnicity, i would think youd want as many choices as possible, but it will all get muddled when you are choosing where to draw the line.  a latino with african, european and native american ancestary whose skin tone is bronze/light brown/tan, is that a whole category?  what about ligh skinned african americans and dark skinned affrican americans, or just one category?  good think baseball is not popular in the middle east.  arabs, persians, turks, georgians, greeks, kurds, egyptians, well, you get the idea.


#10    Tangotiger      (see all posts) 2009/04/16 (Thu) @ 13:46

I was definitely thinking of at least 2, if not 3 categories: ethnicity, race, skin-color.

And I was thinking of it in terms of a scale, so that someone can mark ARod on a two-dimensional scale of black on the left and white on the right as being say a bit to the right of the midpoint.  If I were to make it three-dimensional to include say North American aboriginals or “pacific islanders”, it might get more complicated.

Skin-color is I think the “perception is reality” focus.


#11    Tangotiger      (see all posts) 2009/04/16 (Thu) @ 13:52

Someone emailed me this link:
http://www.baseballresearch.com/hbp.htm


#12          (see all posts) 2009/04/16 (Thu) @ 16:28

Thanks for doing this!  I was hoping that Sean Lahman or Retrosheet would add it to their database at some point, but neither has.


#13    Dan Brooks      (see all posts) 2009/04/16 (Thu) @ 16:38

With respect to skin tone:
http://ase.tufts.edu/psychology/documents/pubsMaddoxCognitive.pdf

There are many other references, I just thought this might be a place to start.


#14          (see all posts) 2009/04/16 (Thu) @ 16:52

I would not mind helping with a retrosheet like project that helps codes the players as Adam mentioned.  If the guidelines are set, it shouldn’t be too difficult to do a current team in an hour or less.


#15    dave smyth      (see all posts) 2009/04/16 (Thu) @ 18:13

I haven’t read the posts here, nor do I have any special insight…

But I suspect that skin color is the major factor in perception, and should get by far the highest weight. Even in cultures where everyone has the same racial basis, lighter skin seems to be preferred to darker skin as to inherent status. This is what I’ve been told by various types of people, such as (Asian) Indians and (American) blacks.


#16    Brian Cartwright      (see all posts) 2009/04/16 (Thu) @ 21:14

I was working on this a few months ago. I made a list of races, and also nationalities. I based nationality on where you learned to play baseball (Bert Blyleven is American).

Hispanic is a linguistic group, not a race. Hispanics can be white, black, native American, or any mix. I just keep it as a seperate checkbox.

Players from Latin America can be a problem, as they are any of or a mix of white, black and native (European, African, Native). My impressions were that Mexicans were mostly Native, Cubans either white or black, Dominicans black, Puerto Ricans mixed white/black, Venezuelans mixed native/black/white...but there ar variations


#17    dan      (see all posts) 2009/04/16 (Thu) @ 21:18

What about a “spectrum rating” where you ask people to rate skin tone on a scale of 1-10. One end of the spectrum would be someone with very light skin like David Eckstein and the other end would have players that are very dark like Mike Cameron.

I’m not 100% sure what you’re trying to do with this, but here’s another suggestion. If you are trying to find something involving racial bias, it might not be 100% based on skin tone. David Eckstein is just as American as Rick Ankiel, but Ankiel has a darker skin tone. I think if a pitcher has a racial bias, it wouldn’t show up between Eckstein/Ankiel despite their differences in skin tone.


#18    jinaz      (see all posts) 2009/04/16 (Thu) @ 21:43

On the issue of skin tone...why not just measure skin tone directly?

The head-shot photos taken of major league players aren’t completely consistent in their exposure, lighting, etc.  But my impression is that they’re reasonably close.  You could sample a 5-pixel by 5-pixel spot on every player’s forehead from their headshot, and then just report the average brightness of those pixels.  This could be done in Photoshop, GIMP, or I imagine one of you codewizards could put together a quick program to made it reasonably automated.  You can find “facepacks” of MLB player faces on some of the baseball sim sites, like those for out of the park baseball.  Historical packs are even available, though the consistency of photography of older images will vary a lot.

This won’t answer the ethnicity question, but it would give you a quantitative description that has at least something to do with what we perceive as skin tone.  I’ve used similar methods to document the brightness of visual signals in animals.
-j


#19    Tangotiger      (see all posts) 2009/04/17 (Fri) @ 10:13

I responded to someone else, but I think it’s worth putting out there:

I agree about the subjectiveness of it.  From my standpoint, it becomes irrelevant.  For example, someone can say that ARod is “60%” black (race), 30% Hispanic (ethnicity), and 80% “tanned” (skin color).  I simply compile all the data, and report the results.

If a researcher comes along and wants to know about dark-skinned players, then he has the data, and he can slice/dice it as he needs to.

There are two main keys:
1 - link everyone to an ID (Lahman, Retro, MLBAM, etc)
2 - get as many contributors as possible

My Fan Surveys have worked pretty well in the past, and I’m thinking it’ll work here as well.  If I can get the right questions with the right possible answers, then we can come up with the good biographical information for all active players.  And then work our way back.

***

If you think of potential questions, in addition to those I listed:
1. Do pitchers hit more or fewer batters of the same “x”, where x could be skin color, race, ethnicity, birthplace, size, or whatnot
2. Is there a “x” change over time, and what is it?

Someone else offered the following:
3. Would this person have been allowed to play MLB pre-Jackie Robinson?

Again, an obvious and great questions, which requires a different census-taking.

So, when I create a ballot, I can ask: “Would this player have been able to play in MLB prior to the color line being broken by Jackie Robinson”, and I can give answers like: “No”, “He might have been light-skinned enough”, “Yes, definitely”.

So, think in terms of the potential questions a researcher would have, and then we can create the census questions that would handle that.


#20          (see all posts) 2009/04/17 (Fri) @ 11:34

This is probably a small effect due to the proportions in the population of ball players, but something to think about.  People from the US are, I get the impression, generally dumb about actual nationalities, ethnicities, etc., and only pay attention to skin color (and maybe accent).  People from other parts of the world are, again my impression, much more cognizant of “real” nationality.  E.g. A player from China might be affected (more likely to hbp, etc.) when facing a player from Japan or Korea, even though a white player from Kansas would see the Japanese, Korean and Chinese player as “the same”.  I’m not sure to what extent this exists in other parts of the world, but the Chinese/Japanese difference is the classic one missed by most Americans.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 14:58
Pete Palmer’s new book: Basic Ball

May 25 14:44
What sabermetrics is NOT

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion