THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, September 15, 2010

What park factors to use for awards…

By , 01:07 AM

There was some discussion on BBTF about what park factors to use if you want to adjust CC and Felix’ stats for the CYA.

Some prominent posters over there think that one year, 2010, park factors (how the park “played” this year) are the ones to use, since you are not trying to figure true talent or performance going forward.

While the latter is true, I strongly disagree - with one qualification.  That qualification is that a one-year park factor does include, to some extent, weather conditions for that year (and the schedule and opposing players I guess).  However, I think that is a small consideration, and can easily be accounted for by weighting each year in computing multi-year PF’s.

Ignoring that (the weather and opponent issues), the one-year park factor is irrelevant in determining how a pitcher (or batter) performed for that year - we only care about a park’s true park factor (for that year).  That is true for an award and not just for a true talent estimate or a projection.

For example, let’s say that the “actual” park factor for 2010 in Safeco was 1.00, which is entirely possible, since we are only dealing with runs scored and 81/81 games (off the top of my head, I would say there is a 15% chance of that occurring by chance for a park that has a true PF of near .90).  And let’s assume that the weather had nothing to do with that - it was simply a fluke, and as I said, not an unusual one at that.  Let’s also assume that true PF for Safeco is .92.

If we want to determine how well Felix actually pitched at Safeco, what do we use?  I contend we use the .92 and I am pretty sure that is correct. Why do we care that the runs scored at home when Felix wasn’t pitching was exactly equal to the runs scored in other parks, in 2010?  We don’t.  And of course if Felix’ own Safeco PF (only when he pitched) was 1.00 also, the sample is so small as to be practically meaningless.

BTW, that is the same mistake that many people (including prominent analysts) make when adjusting team records for strength of schedule.  To do that, you use the estimated true talent of opponents, and not their actual records.  I can easily prove that using actual records is wrong.  Say you have a two-team league and one team is 100-62 and the other is 62-100 (of course).  If you adjust each team’s record for their strength of schedule using the actual records of each team, guess what you get?  You get each team being a .500 team.  That is true regardless of each team’s record in a 2-team league!


SabermetricsAwardsParks
#1    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 02:59

I contend we use the .92 and I am pretty sure that is correct.

You should feel perfectly correct.

The park factor is the park factor, but how you decide to calculate may differ a bit year-to-year.  I wrote this several years ago, but the points are still relevant:

http://www.tangotiger.net/parks.html

1 - static conditions: the dimensions of the park. You should use 100 years, if the park hasn’t changed, if you have it, because 300 feet is 300 feet.

2 - dynamic conditions: the weather, the cut of the grass, the wetness of the field, the wind, etc, etc. You should use multi-year if these things are predictable-dynamic, but single-year if they are unpredicatable-dynamic. If the groundskeeper is the same guy, and cuts the grass kinda the same way, then use multi-years. The wind patterns probably change drastically, so you should use single-year.

3 - other parks: Park factors are relatively to other parks. You can have say the Big O be a hitter’s park 10 years ago, and a pitcher’s park today, even if the park is exactly the same and the climate hasn’t changed, simply because the other parks have become/introduced as hitter’s parks.

(Note: because of this, regression towards the mean is not necessarily towards 1.00. You should look at the same parks year-to-year, and in some years, that could mean 1.02 or .98, etc, because the other parks that are not part of the analysis do make up the park factor. )

4 - the tendencies of hitters/pitchers: If you have a team filled with lefties or flyballers or whatever, that introduces a bias. A park that might be optimal for flyballers, and you have a team that is filled with flyballers will not show an accurate park factor. The sample of your players should be representative of your population. For extreme parks, like Fenway and Coors, I’m sure they are not. The degree of which, I’m not sure.

5 - the quality of hitters/pitchers: Barry Bonds is great no matter where he hits, and he might not be hampered as much as someone else by playing in a pitcher’s park. So, he hits the ball 390 feet instead of 400 feet. If ReyRey hits the ball 320 feet instead of 340, that’s a big difference. Again, I don’t know the degree of impact here, but it is a consideration.

6 - the game context: Different game states (score/inning/base/out) results in different hitting/pitching approaches. Again, probably small impact, but needs to be determined.

So, your “park factor” is made up of several factors, each of which needs to be analyzed on its own. For some, you can use multi-years, and others you need single-years, etc, etc.


#2    Mike Rogers      (see all posts) 2010/09/15 (Wed) @ 03:33

Would it be appropriate to use a PF tailored to the road parks CC/Felix/Liriano/et al. pitched in this year? Obviously, multi-year PF’s, but weigh them with in with his home PF to get a “true” PF so-to-speak?


#3          (see all posts) 2010/09/15 (Wed) @ 04:51

The problem with using park factors for CY Young considerations is that you are assuming a pitcher has no ability to adjust his game to the park or weather conditions. 

Besides, weather conditions can vary wildly during the course of a game, any adjustments would be riddled with uncertainty and error due to the poor data.

As for using team records, or even estimated records based on true talent, such records include the pitching and defense of the other teams.  Pitchers only face the other teams hitters.  The A’s true talent may be a 500 team, but their hitters are well below average, so pitchers on a given team have an easier time than their hitters.

Using the hitters true talent is also problematic, since SP’ers sometimes face a hitter only once or twice a year.  Anyone who faced the Red Sox in April would receive unfair credit for facing a lineup that on paper was very tough, but in reality, at the time, was not a tough lineup. Same thing can be said of facing them post July 1 due to injuries.  Sometimes it’s not who you play, but when you play them.

And then we can look at umpires, if you get lucky and face tough hitting teams with an umpire with a large or small strike zone, this would be more of a factor than park.  Of course, good pitchers can adjust to umpires strike zone as well, just like park.

All you should do really is to look at the teams they faced, and try to estimate who faced the tougher batters, taking into consideration how the team was hitting when the pitcher faced them, and adjust accordingly. 

But really, why bother, what it comes down to in the end is that Cy Young is not about who has the most talent or ability, but who performed the best in terms of actual results.  IP, K/9, WHIP, K/BB and R/A are all important, although not all rated equally. Only if the candidates are pretty much a wash, then you would go with the guy who pitched with the toughest schedule.  And if that is even, then go with park.

CC and King Felix is not even close.  Better K/9, better WHIP, lower runs allowed.  Too bad he could not pitch against the Mariners for 5 games like CC will get to do against the Orioles.  Slam dunk really for King Felix.  If you want to look at SOS, no question, King Felix had a tougher schedule.


#4    Rally      (see all posts) 2010/09/15 (Wed) @ 10:00

BB-ref has changed their park factors.  They had been using a multiyear park factor up through 2009, and not updating for 2010 stats.

Now it looks like it is an average of 2009 and 2010 park factors, updated weekly, and the change is enough (along with CC’s 8 shutout innings and Felix getting hammered by the Angels) to put CC in the WAR lead.  You may take prefer another approach (3 year? 4 year? add regression component) but I think right now they are close enough that we can’t be sure who has pitched better.

At this point the Cy Young races are too close to call in both leagues.  Let’s let them finish the season before we start campaigning for a pitcher.

As for strength of schedule, CC is also facing the Rays, Red Sox, and Home Run Derby of Canada more than Felix does.  BB-ref counts strength of schedule in the WAR formuala.


#5    Rally      (see all posts) 2010/09/15 (Wed) @ 10:02

In the NL it looks like we’re down to Halladay and Jimenez.  Josh Johnson has great numbers too, but he’s done, as is Johan Santana.


#6    Ken      (see all posts) 2010/09/15 (Wed) @ 10:30

I’d include Wainwright in the discussion in the NL.

And the argument over SOS for Felix versus CC really comes down to numbers. Who faced the better hitters over the year? I can’t find that statistic quickly, but someone must have calculated it.


#7    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 11:07

Now it looks like it is an average of 2009 and 2010 park factors, updated weekly, and the change is enough

That’s not a good way to do it in my opinion.  MGL in the this thread already laid it out, and I had an article about it that linked it.  I think Patriot does 5 years for PF, and that’s much better.  Is that what Fangraphs uses?


#8    Rally      (see all posts) 2010/09/15 (Wed) @ 11:10

Prospectus has that data.  Edge to Felix, but a small one:

OPS of batters faced:
.717 CC
.727 Felix


#9    Rally      (see all posts) 2010/09/15 (Wed) @ 11:31

I’d prefer a 5 year PF to 2 (except for the Yankees, who only have 2 years), but it makes little difference.  Forman is getting a .95 PF using 2 years, I get about .96 with 5 years:

2010 .91
2009 .97
2008 .97
2007 .99
2006 .94


#10    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 11:33

Rally/8: also not a good enough way to do it.  I presume BPro is using current-year stats to determine that?

This is what MGL is driving at: you need to know the *innate* talent of the player or park.  So, what you want to do is get the Chone forecast for each hitter that each pitcher faced.  Barring that, take each hitter’s last 3 or 4 years of hitting stats.


#11    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 11:35

Rally: good stuff.  Well, then I am wrong, and two years, in this case, is not so problematic.


#12    MGL      (see all posts) 2010/09/15 (Wed) @ 11:37

#6 and #8, are you guys using or talking about their numbers from THIS year?  I just got done saying how that is a horrible (misleading) way to do the adjustments!  It is just as bad as telling is that Martin Prado or Andres Torres are two of the best hitters in baseball and Jeter and Beltran (OK, Beltran was hurt this year) are two of the worst!  You want to adjust pitchers by the TRUE TALENT level of their opponents, NOT by what they did THIS year!

Tango, I don’t think you EVER want to use one-year PF.  Way too much noise.  As I said, if you want to account for weather, etc., for THAT year, you still have to use multi-year factors, but you can either weight that year more heavily or you can “manually” adjust for weather if that is even possible.  But to take a park whose long-term, regressed (estimated true) PF is .90 and use a one-year PF of 1.00 because you think that it may have been warmer in that season and/or the wind was blowing out a lot, is ridiculous, IMO.


#13    Colin Wyers      (see all posts) 2010/09/15 (Wed) @ 11:42

"It is just as bad as telling is that Martin Prado or Andres Torres are two of the best hitters in baseball and Jeter and Beltran (OK, Beltran was hurt this year) are two of the worst!”

No, it really isn’t.

Projection systems are better than current season stats than establishing the true talent of any ONE player in that time period. They are WORSE at establishing the talent of a POPULATION of players. This is pretty rudimentary stuff.


#14    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 11:49

MGL/12:

I guess my post was too nuaned.  While I did say that using one-year PF was appropriate, I meant it specifically only with weather-related.  If the weather is, say, 20% of a park factor, then you can use, say, 20% of one-year PF, and then equally weight all 10 years you have of a park.  So, you might give the current year a weight of 1.20, and the other nine years 1.00.

If on the other hand you have a park that has LOTS of wind, and the wind patterns change constantly, and is nowhere close year to year, then the weather-related component might account for 50% of a PF.  And so, you weight the current year much more.

It’s all about figuring out what affects the park, and trying to find the right weightings.  You can, AND SHOULD, weight a dome park much differently than Wrigley.  For a dome park, you might say you will weight the last 25 years equally.  But for Wrigley, you would have a much different weighting scheme.

Agreed?

All numbers for illustration purposes only.


#15    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 11:52

They are WORSE at establishing the talent of a POPULATION of players

If by population of players you mean all players in MLB, then I might agree (if only because of the sheer number of players, not on principle).

If by population, you mean a subpopulation of players, such that it is the whole population of hitters faced by CC, then I disagree.

CC could very well have faced a disproportionate share of hitters that have done worse than expected.  While those performances adjusts our estimate of their true talent, our best guess as to the talent level of the CC subpopulation is a combination of their performance against all pitchers in 2010, plus our forecast for them entering 2010.


#16    dkappelman      (see all posts) 2010/09/15 (Wed) @ 11:54

#7 Tango:  FanGraphs does use the 5 year regressed.  It’s the exact same method as Patriot’s, though we calculate it in house and there may be some slight differences. 

We’re still on ‘09 park factors though and we’ll update things at the end of the season.  I think I set the new Twins park at 1 for now, but with only one year I imagine it will be close to 1 anyway since it will be heavily regressed.  I guess the new Yankee stadium could change a bit too.


#17    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 12:12

To further show why you can’t just use the current year stats of CC’s hitters:

What if this was April 15?  Are you really going to rely on the stats that those hitters accumulated over the first 10 or 12 games to indicate how bad the hitters were?

No, of course not.  Because sample performance is sample performance no matter how many games we are talking about.  Granted, if CC faced 900 batters, each of which faced 400 pitchers, then, yeah, it’s going to come close to their true talent level as a population of hitters CC faced. 

But, we have a better estimate if we ALSO include their stats from 2009 and 2008 and 2007, and if we know how old they are, and if we know what parks those hitters were playing in, and so on.


#18    JEH      (see all posts) 2010/09/15 (Wed) @ 12:28

For the specific case of evaluating performance for a particular season, I would apply some weight to the single season park factor if the single season numbers varied significantly from the historical numbers.

The amount of weight applied to the current season would be related to how confidently I could explain the difference in the numbers.


#19    Colin Wyers      (see all posts) 2010/09/15 (Wed) @ 12:28

No, I wouldn’t use unregressed lines, be it 10 or 100 or 1000 games (although at a certain number of games the amount of regression becomes unnoticeable).

The important thing to understand is that the hitters who faced Sabathia aren’t selectively sampled based upon their results in those PA - we’re dealing with a largely unbiased sample. So with large enough N we’re going to reach true talent. (As you note, below that point we have to regress.)

When we use projection systems, we’re making tradeoffs to get the best individual forecasts we can. For what we want to use projection systems for, this is fine, desirable and necessary. But it does reduce their utility in other circumstances, the same way that the sum of team WAR(P) is less indicative of a team’s true talent than their third-order wins or Pythag or whatever.


#20    Colin Wyers      (see all posts) 2010/09/15 (Wed) @ 12:30

I should add - as Tom notes, you have to be careful to make sure that Sabathia’s own performance as a pitcher isn’t having an effect on his own performance line - you either have to exclude that hitter’s performance against Sabathia or downweight it in your sample. But that’s not terribly hard to do, and so I don’t think it invalidates the methodology altogether.


#21    Patriot      (see all posts) 2010/09/15 (Wed) @ 12:32

The Twins RPG at home is 8.87 versus 8.80 on the road, so it looks like it’s going to get a 100.

The Yankees split is 10.23/8.61, which is a big difference from ‘09 (10.11/10.48), and will result in it getting a 103 or so even after my crude regression.


#22    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 12:42

So with large enough N we’re going to reach true talent.

Right, that’s no different than me saying:

Granted, if CC faced 900 batters, each of which faced 400 pitchers, then, yeah, it’s going to come close to their true talent level as a population of hitters CC faced. 

The question is what is “large enough N”.  I’d like to believe that single-season performance numbers (park- and opponent-adjusted) is good enough.  It probably is.  I just don’t like it that it’s being used without a study to back it up.

I don’t think we disagree much, even if it might look like to interested observers that we otherwise are.


#23    JEH      (see all posts) 2010/09/15 (Wed) @ 12:49

Does anyone advocate a park adjustment, in this context of player-to-player performance comparisons, based on the handedness of the pitcher?

In this case, CC throws in a park that (I believe) is especially unfriendly to right handed pitchers, so we may be giving him credit for throwing in a tougher environment than he actually is as a lefty.


#24    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 13:14

You want to know who has performed as the most outstanding pitcher.  Being LHP is as much a trait as being a FB pitcher.  You need to account for the context the pitcher is pitching in compared to the context that all pitchers pitch in.


#25    JEH      (see all posts) 2010/09/15 (Wed) @ 14:30

@24

I understand (and agree with) your LHP / FB analogy so you could easily extend this argument in that direction.

Also, I don’t have a position on this, but let me play the devil’s advocate here.  The argument laid out below is one that can be construed as being against single year Park Factors but is mainly aimed at the concept of knowing exactly what the numbers mean.

Finally, I am assuming left handed hitters hit better at Yankee Stadium than at the average park.

This season the Yankees offense averaged about 6.0 runs per game at home and 4.7 runs per game on the road.  Yankees opponents, against the Yankees, averaged (roughly) 4.1 RPG in New York and 3.9 RPG elsewhere. 

The Yankees offense, with only two righties (Jeter and ARod) seems built to take advantage of its home field.  The Yankees pitching staff, with 1.5 lefties (CC and Pettite) is not. 

The result is that the Yankees (single season) PF is inflated by an offense designed to take advantage of the stadium and a pitching staff not designed to take advantage of it.  So Sabathia’s numbers (2.56 ERA at home / 3.46 road) look good, but part of that is him being in a favorable situation as not only a lefty in Yankee Stadium, but a lefty surrounded by a cast that makes him look good in the light of the Park Factor metrics currently in use.

So, when we consider how well Sabathia pitched this year how much do we want to take into account the Park Factor?  For example, if the Yankees had added Cliff Lee instead of Javier Vazquez last season then we might expect the visiting team to average 3.5 RPG instead of 3.9 at Yankee Stadium.  Under those circumstances, with no change to Sabathia’s performance, his (PF adjusted) numbers slip a bit.


#26    MGL      (see all posts) 2010/09/15 (Wed) @ 14:30

"Projection systems are better than current season stats than establishing the true talent of any ONE player in that time period. They are WORSE at establishing the talent of a POPULATION of players. This is pretty rudimentary stuff.”

I just skimmed the posts after that comment, but it is 100% incorrect.  I don’t even know what that means.

Obviously 100 players with 100 PA each is better than one player with 100 PA, but 100 players with 100 PA each is a lot worse (standard error-wise) than one player with 10,000 PA.  And a projection (actually a retroactive projection, since we want to estimate the true talent of these players over the course of the 2010 season, and not in the future) is ALWAYS going to be better than actual stats.  Always.  The projection includes the actual stats of course, plus other relevant information, so it is always going to be a better estimate of true talent over any period of time.

And when adjusting a player’s stats for “opponent strength” we ALWAYS want to use an estimate of the opponents’ true talent during that time period.

So you can estimate that true talent any way you want, but don’t tell me that actual stats are a better estimate of that true talent than a “projection” which includes those actual stats. That makes no sense.

IOW, if you agree that a player should be adjusted by the TRUE TALENT of his opponents and not what his opponents actually did for that season, then we are in agreement.  If you are arguing that you want to adjust by what that player actually did in that season, then I am afraid you are 100% wrong, and I can easily give an example of how that is true:  Say a pitcher faces Pujols 10 times in a season and you want to adjust his pitching stats.  Say he was on the DL for most of the season and was actually healthy when he played. But he only got 200 PA and had a wOBA of .310, which is entirely possible by chance alone. Why would you adjust that pitcher’s stats by the .310 wOBA?  What magic is there in those 200 PA (or 190 when he didn’t face that pitcher) that is not in his PA from last year and the year before?  None of course (other than you weight the current year more heavily of course)!

What if he had 20 PA but he was perfectly healthy and had a wOBA of .110.  If a pitcher faced him 5 times that year, would you use the .110 wOBA to adjust his pitching stats?


#27    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 14:46

JEH: you are basically discussing one of the points I brought up:

4 - the tendencies of hitters/pitchers: If you have a team filled with lefties or flyballers or whatever, that introduces a bias. A park that might be optimal for flyballers, and you have a team that is filled with flyballers will not show an accurate park factor. The sample of your players should be representative of your population. For extreme parks, like Fenway and Coors, I’m sure they are not. The degree of which, I’m not sure.

In order to have a good park factor, you need to get a REPRESENTATIVE sample of players.  And if you can’t do that, then you have to adjust the bias in your players.

Most generic park factors don’t do that.


#28    JEH      (see all posts) 2010/09/15 (Wed) @ 15:39

TT-

Yes.  I guess my question is: does anyone think we should try and calculate the single season Park Factor on a more representative set of players to make it more applicable to the players being evaluated?  For example, a PF isolating the handedness of the pitcher.

I DO think we need to consider a single season Park Factor when it varies significantly from a historic Park Factor.  The blatant case was the introduction of the humidor at Coors, but I would guess that many non-recurring factors (as you mentioned above) could differentiate the scoring environment in one season from another.  Quantifying those that can be quantified seems worthwhile . . . I hate to leave information on the table, so to speak, when making an evaluation.


#29    JEH      (see all posts) 2010/09/15 (Wed) @ 15:42

TT-

You wrote:

“Most generic park factors don’t do that. “

Are there other Park Factor calculations that attempt to isolate certain variables?


#30    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 16:05

Analysts privately calculate their own more specific park factors. 

As I said, among other things, I look at who actually played in the parks, not presuming that CC pitched exactly one time in the 13 other AL parks and 13 times at Yankee Stadium.


#31    CJE      (see all posts) 2010/09/15 (Wed) @ 17:46

Won’t some parks that generate more home runs also see a decrease in hits of other types? If Citizen’s Bank has a lot of homers but a smaller outfield, I would think less hits would fall in. For example, if Halladay has an average HR/FB% and Citizen’s Bank is only a high scoring run environment due to homers, making an adjustment for park factor gives Halladay a little too much credit unless decreasing HR/FB% is a skill.


#32    MGL      (see all posts) 2010/09/15 (Wed) @ 19:12

"I DO think we need to consider a single season Park Factor when it varies significantly from a historic Park Factor.”

I meant to respond to this before, but absolutely not!  The fact that one year is completely different from previous years is NOT a reason to use that year alone.  No, no, and no!

Does it suggest that something “real” is different that year?  Absolutely.  But that will be accounted for in the multi-year factor.

#31, right, that is why using components and regressing each one separately (which you pretty much have to do) can be tricky and inaccurate. Which is the main reason why I prefer to use runs (rather than components turned into runs) when I can use many years of data for a park.


#33    tangotiger      (see all posts) 2010/09/15 (Wed) @ 19:42

rather than components turned into runs

I don’t get this.  If you have runs scored, that is a combination of component hitting stats, component baserunning stats, and sequencing of events.  The sequencing of events is huge.  Indeed, the only thing missing between a runs created formula and actual runs scored is the sequencing of components. 

We know that the difference between runs scored and component runs at the team seasonal level is one SD = 22 runs, or roughly 3% = 1 SD.

That means if you have 4 years, then 1SD = 1.5%.

Given that all the PF are so tight, why allow this error to exist?

I just don’t see why we want actual runs and not component runs.


#34    MGL      (see all posts) 2010/09/15 (Wed) @ 22:02

"I just don’t see why we want actual runs and not component runs.”

Because when you start to regress all the components toward 1.0 (presumably - say you know nothing about the park), you run into problems, such as what #31 is talking about.  And, as you say, unless you include baserunning, you are missing something…


#35    tangotiger      (see all posts) 2010/09/16 (Thu) @ 00:49

The problems of #31 don’t do anything for me.  They exist in regular runs scored too.

And ignoring baserunning (which you don’t have to if you ALSO include it in component-runs) is less costly than including sequencing of events.

Including sequencing is like saying “hey look!  it’s easier to hit with men on base than bases empty at Yankee (or whichever) Stadium!”


#36    MGL      (see all posts) 2010/09/16 (Thu) @ 01:04

Tango, for large samples of data - as I said, for parks that have not changed much or at all, I use maybe 20 years of data (and of course I adjust for the “other parks"), the “sequencing of events” should even out.  I am simply more comfortable using actual runs scored that using multi-year components, regressing them individually, and then converting them into runs. I think this is a reasonable position.  Again, for large sets of data (many years).

I’ve tried both, the using actual runs seems to pass the sniff test better than using components.  Now, for park adjusting individual players (or teams), I definitely use component park factors. But if I want to estimate how a certain park will affect run scoring in the future, I like to use multi-year run factors, using runs scored only.

And of course I adjust for the pool of players as well in the home and away parks.  And weather.


#37    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 02:02

If you are talking about 10 or 20 years, then I agree.

But what about what Sean is doing, using 3 years?  If you intentionally limit yourself to 3 years, and I’m not even sure Sean uses regression, then what?  Are you better off with sequencing?


#38    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 02:06

I am simply more comfortable using actual runs scored that using multi-year components, regressing them individually, and then converting them into runs.

But what if you do single-year component-runs (i.e., BaseRuns) and THEN regress the estimated runs scored?  Why would that be worse than using actual runs scored, and regressing actual runs scored?

I don’t see how regression is a differentiator. 

You lose on a park’s effect on non-stealing baserunning (how much could that be?) while gaining on non-sequencing.


#39    JEH      (see all posts) 2010/09/16 (Thu) @ 08:15

@32/MGL-

Just to be clear, I am not in favor of using a single season Park Factor for predictive purposes, but I am interested in using it for evaluative purposes.  [This can be a bit cloudy, as performance evaluation will be used to make predictions for the following season, but using the historic Park Factor.] I am going to use 2010 Park Factors to evaluate 2010 performance to the extent that I think I can explain the differences in the run environment (See #18).

Are you objecting to that?

I think it would be a mistake to leave known information out of the equation when evaluating performance for an award vote or a free agent signing.


#40    MGL      (see all posts) 2010/09/16 (Thu) @ 11:25

"But what if you do single-year component-runs (i.e., BaseRuns) and THEN regress the estimated runs scored?”

Sure, that’s fine. I never thought about that.

“Are you objecting to that?”

Yes!  Did you not read all of my posts?  Even for retrospective evaluative purposes it is INCORRECT to use actual PF!  And you are not “leaving out known information” when you use multi-year PF’s. You are PROPERLY including that known information.  By using one-year PF’s, whether for evaluative purposes OR for projection purposes, you are leaving out “known information” from other years, information which is important for putting that performance which you are evaluating into context.

As ANOTHER example, let’s say that Kevin Kouzmanoff (of the A’s) hits 60 HR, but his park was being renovated and he was playing his home games in a local Little League park where the fences are 300 feet around.  But, by chance alone, the one year HR PF for that Little League park was 1.00.  Do you want to credit Kouz for all of those HR?  Think about that before you answer…


#41          (see all posts) 2010/09/16 (Thu) @ 11:46

In regards to the daily weather variance, this site (http://www.scoutingbook.com/blog/blowin-in-wind-41710) gives you the game time weather and wind conditions, but unfortunately doesn’t seem to adjust for them accordingly in terms of their rudimentary park factors.

I guess the next step is that somebody needs to figure out the effects of relative temperature and wind direction on run scoring, hook it up with game time weather conditions and generate some sort of basic daily park factor.


#42    JEH      (see all posts) 2010/09/16 (Thu) @ 12:22

"As ANOTHER example, let’s say that Kevin Kouzmanoff (of the A’s) hits 60 HR, but his park was being renovated and he was playing his home games in a local Little League park where the fences are 300 feet around.  But, by chance alone, the one year HR PF for that Little League park was 1.00.  Do you want to credit Kouz for all of those HR?  Think about that before you answer… “

I have no idea where you are going with this example.  In a discussion of expected (historic) behavior versus observed behavior you have introduced an example with no history. 

As far as the more general discussion, you seem to be on the same page as me in #12 with:

“As I said, if you want to account for weather, etc., for THAT year, you still have to use multi-year factors, but you can either weight that year more heavily or you can “manually” adjust for weather if that is even possible.”

but you discount the likely impact and don’t seem willing to take the extra step to attempt to explain the differences between actual and predicted results.

Perhaps I have created some confusion by equivocating on the term Park Factor (I state that I am going to modify it for the current season but still use the same term).  My position is that I am going to take the current season’s observed results for a park and if they disagree with the predicted results then I am going to try and explain them before writing them off to chance variations.  To the extent that I can explain the discrepancies (weather, fences, line-up, game start times, attendance [I just saw an article on that smile ]) I am going to use a modified Park Factor (that may or may not be closer to the current season’s value than the historic value) for evaluating performance.

I think that’s as clear as I can get and if we are still on different pages than I guess we are destined to disagree on this.


#43    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 12:32

As long as we accept that how Fenway affected the observations in 2004 has some relevancy to how Fenway affected observations in 2010, then we should really have no issues here.

Remember the reason that Marcel weights the last 3 years and limits itself to 3 years: humans change.  They get older, stronger, weaker, etc.

That does NOT apply to parks in most cases.  There are some park-related things that are different, notably the weather (temperature, wind). 

As long as you are sensible in why you are discarding data, or overweighting data, then there’s no issue.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion