THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, February 27, 2008

Academic Paper: Assessment of free agency on player performance

By Tangotiger, 09:59 AM

This is a 140 page pdf.  I haven’t read it yet.

I’d suggest that any comments made reference the page number (note if it’s the PDF page number, or the printout page number).


#1          (see all posts) 2008/02/27 (Wed) @ 11:10

You can start on page 74 of the paper (page number as shown on the bottom of each page).

Basically, he took all free agents who signed multi-year contracts, and had at least 250 AB in the walk year, the year before, and the year after.  OPS, RC, and Win Shares all increased in the walk year, and decreased after, to a statistically significant degree.

For OPS, the numbers are: .784, .795, .773.  I think the dropoff from year 2 to year 3 is something like 8 runs, which is baseball significant as well as statistically significant.

But my first gut reaction is: you have a huge selection bias by taking only players who signed *multi-year* contracts.  Those would be the ones more likely to have had a career year (or, more accurately, an above-their-head year) before signing, which would explain the results. 

Also, these are older players, so you’d expect something of a decline each year.  How much of a decline?  Dunno.  Suppose a player drops from 80 RC per season to 50 RC over 10 years before losing his job: that would be a decline of only 0.3 RC per season.  So probably that doesn’t matter that much.

But the selection bias, I think, is quite large.  I think you have to use ALL free agents, not just free agents who signed multi-year contracts.


#2    Tangotiger      (see all posts) 2008/02/27 (Wed) @ 11:17

I agree totally.


#3          (see all posts) 2008/02/27 (Wed) @ 11:21

Er, in #1, the age decline works out to 3 RC per season, not 0.3.  Oops.


#4    MGL      (see all posts) 2008/02/27 (Wed) @ 23:17

Have not read the article yet, but agree, agree!  And if the authors do not mention the potential (and likely) selection bias, they have NO BUSINESS doing a study like this, or any study involving statistics for that matter!

Not to mention that if the total RC or WS (or lwts, or whatever you use) for all the players in the walk year is greater than average (for that age group), there would automatically be a drop-off in the next year due to regression to the mean, even if there were no selection bias (although it would be hard for that to happen with no selection bias).  I hope they addressed that (regression toward the mean).

And yes, the aging dropoff is going to be significant, depending on the average age of the players.  You are probably taking about something like .5 wins (5 or 6 runs) expected drop per year, which would account for almost all of their observed drop right off the bat.  I also HOPE that the authors considered aging as well, although that would not explain the prior year being less than the walk year.

My first reaction would be that aging, regression to the mean, and selection bias (the latter two being related) would explain results like these, but I’ll read the article before etching that thought in stone.


#5    MGL      (see all posts) 2008/02/28 (Thu) @ 00:17

OK, reading paper.

Significant disparities in performance between pre-option year and post-option year offensive production suggest that free agency influenced changes in player productivity.

Isn’t one of the cardinal rules of this kind of analysis, not to equate correlation with causation?  To state that a “significant disparity ("difference" I think is the “English” word) in performance” means that anything influenced anything is a “statistical no-no,” is it not?

As an aside, I would NOT use win shares in this type of analysis (actually I would never use win shares for any type of analysis).  It is not fine enough for this type of study, and it is potentially too tied to a team’s win loss record.  Which brings up another side issue.  They mention,

Also, covariates age and team winning percentage were incorporated into this study in an attempt to control for specific factors thought to impact individual player performance.

I have never heard it mentioned that team wp is “thought to impact individual player performance.” I suppose it may be true, but has anyone ever studied that before?  Do they have any citations for that claim?  If not, it should NOT have been made, especially since it is probably not true.

And of course, if any sabermetrician did a study like this, they would use wOBA or lwts and NOT OPS, RC, or WS.  I think.

Finally (as my asides), I am worried that park affects might corrupt a study like this.  What if certain teams are dominating the signings of FA to multi-year contracts?  That would definitely be a potential problem and might require doing park adjustments to the data.

Anyway, back to paper.

Fifty players, who otherwise qualified for the study based on their free agent status and number of at-bats, were eliminated because they played for two or more teams in a single season at some point during the pre- and post-free agent years. In-season player movement complicates the interpretation of player motivation and inequity perception as it translates into individual offensive performance.

I am not sure I would have eliminated these players (I don’t think it makes much difference in terms of the various hypotheses), but it is probably no big deal other than it reduces the sample size of the data.

Data for two of the dependent variables, Runs Created and Win Shares, was normalized for each player. For instance, Runs Created metrics were divided by players’ number of at-bats and Win Shares metrics were divided by players’ number of games played for each season. The average number of at-bats and games played for the subjects remained largely the same throughout the three time periods being examined. On average, players played in 131 games and totaled 467 at-bats in their pre-option year. In their option year, players averaged 134 games and 473 at-bats. During the post-option year, players averaged 131 games and 469 at-bats.

Should RC be divided by AB?

Interesting that he got 134 games in option year and 131 in pre and post.  Was it BP that did a similar study and concluded that there was no difference in performance rate in option year but an increase in games played, suggesting that players in a walk-year wanted to stay in the lineup as much as possible?

The average age for the free agents in this study was 31.

You are DEFINITELY going to see significant aging drop-offs in any subsequent year at that age.  I don’t have my aging curves in front of me, but I would think it would be on the order of 5 runs.

The first General Research Question asked simply whether or not free agency affected the offensive performance of Major League Baseball players. According to each of the of the MANOVA analyses for the three dependent variables, the answer is yes.

Again, I don’t think you should be saying things like “A” ‘affects” “B” as opposed to “there is a relationship between” or “correlation” or something like that.  I would hope that this guy’s advisers (strongly) pointed that out to him.

As another aside, I don’t really see any advantage in duplicating the study with OPS, RC, and WS.  WS adds defense, I guess, but it is so coarse, it probably should not have been used anyway.  RC and OPS is basically measuring exactly the same thing, and both somewhat poorly, especially OPS, as compared to wOBA and lwts.

I would, however, have no problem if the authors just used OPS, though, as that is a much more easily understood and more well-known and accepted metric.  As well, any defects in the metric itself should not bias the results in a study like this (although it could, given the selective sampling issue).

ANOVA analyses revealed no significant differences in OPS (p=.071), Runs Created (p=.064), or Win Shares (p=.068) in the last year of a player’s contract when compared to the previous year at the .05 significance level. Therefore, the statistical null hypothesis of no difference in offensive productivity between pre-option year and option year is accepted.

God, how I HATE this blind rejection or acceptance of a null hypothesis based on some arbitrary cutoff point of p.  That is a ridiculous logical assertion and should be banned from all statistics courses from now on!

Give us the significance level, but don’t tell us what to accept and what to reject! 

Research Question Four sought to answer whether there were significant mean performance differences, as measured by OPS, Runs Created, and Win Shares, from players’ option year to post-option year, when controlling for age. Univariate analyses revealed no significant differences in OPS (p=.088), Runs Created (p=.243), or Win Shares (p=.135) in the last year of a player’s contract when compared to the previous year, when removing the effects of age. As such, the statistical null hypothesis of no difference in offensive performance from one year to the next is accepted. Again, the results could not substantiate either equity theory predictions or expectancy theory predictions.

Did they screw this part up, mixing up pre and post?

With respect to performance comparisons between players’ pre-option year and option year, results at the .10 significance level indicated that player performance improved from the pre-option year to the last year of a player’s contract for each of the performance measurements, OPS (p=.071), Runs Created (p=.064), and Win Shares (p=.068). These findings, while not significant at the .05 level are worth discussing...

Well, I am glad that they think it is “worth discussing” even though the results are not significant at the (ever-so-magical) .05 level.

I only read until the Summary on page 91.

Basically, as far as I can tell, without addressing the issue of selective sampling and decline in performance due to aging, I don’t see how you can generate any valid or useful conclusions.

But I am no statistician.  It is nice to do such a thorough analysis, in a statistical sense, with all the right “words” being used, but if your methodology or treatment is flawed or defective, of what good is all the statistical rigor?


#6    MGL      (see all posts) 2008/02/28 (Thu) @ 00:19

One more thing:  Not to say that even after correcting for possible selective sampling issues, there won’t be similar results…


#7          (see all posts) 2008/03/01 (Sat) @ 13:15

I suppose the reasoning on team performance is that guys stay focused on winning teams, but the problem is that even if it’s true it still causes further problems in addition to the selection bias already noted.  We don’t know about the causal connection between team wp===>player performance, but we most certainly do know about the causal connection between player performance===>team wp.  Thus, the authors have built in an obvious endogenous regressor into the model, and the subsequent endogeneity biases all coefficient estimates.


#8    MGL      (see all posts) 2008/03/02 (Sun) @ 15:59

#7, that is a good point!


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main