Read The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball Online
Authors: Benjamin Baumer,Andrew Zimbalist
One technique for quantifying this notion of reliability is to examine the autocorrelation of the statistic. That is, to measure how the statistic changes with respect to previous instances of itself. In
Figure 5
, we show the autocorrelation of batting average for all batters with at least 250 plate appearances in two consecutive seasons during 1985–2011. Each dot represents one player in two consecutive seasons, with the horizontal coordinate representing his batting average in one season, and the vertical coordinate representing his batting average in the next season. If batting average truly measured an absolute skill (like height), then we would expect both coordinates to be about the same for each dot, and so the points would be arrayed in a diagonal pattern, and the correlation coefficient would be close to 1.
19
This would indicate a high reliability for batting average. Instead, we see a pattern with a great deal of variation, and a fairly low correlation (0.414). This suggests that knowing a player’s batting average in one season does not give a forecaster a very good idea of what it will be in the following season—or more precisely, last year’s batting average explains only 17.1 percent (= 0.414 × 0.414) of the variation in a player’s batting average the next year.
Figure 5. Batting Average, Batters
Batting average for batters exhibits a relatively low autocorrelation, mitigating its predictive value and suggesting that it measures a great deal of chance along with batter skill. Here at least 250 plate appearances in each season are required.
Table 3. Properties of Batting Statistics
Mercifully, not every statistic suffers from low reliability. In fact, the rates at which batters strike out, walk, and hit home runs are much more reliable. As a result, it is relatively easy to predict how often a batter will do each of these things in the future, since simply assuming that he will repeat the previous year’s performance is likely to be a good estimate. In
Table 3
, we show autocorrelations for many commonly used statistics, alongside the correlation with team runs scored. Note that statistics that are accurate (like OPS and SLG), in terms of the strength of their respective correlation with a team’s runs scored, are not necessarily reliable, in the sense of being highly correlated with themselves over time. Thus, knowing a player’s OPS gives us a good sense of how much he contributed to his team’s offense in a given season, but it doesn’t reveal all that much about what he is likely to do next season. Conversely, knowing a player’s strikeout rate in one season gives us a good sense of what that player’s strikeout rate will be in the next season (see
Figure 6
), but it is a poor measure of his overall performance.
Figure 6. Strikeout Rate, Batters
Strikeout rate for batters exhibits a relatively high autocorrelation. This makes it quite predictable and suggests that it truly measures an attribute of a batter.
It is natural at this point to question whether the high reliability of strikeout rate is of any practical significance. Since its paltry accuracy suggests that it is not a good measure of overall performance, why should we care what a player’s strikeout rate is? One answer is that hitting is a zero-sum game, in the sense that there are only so many outcomes. Thus, if a player’s strikeout rate is very high, that leaves less room for other, more desirable outcomes of his plate appearances.
20
At a certain point, it becomes extremely difficult for him to be productive, since his production is throttled by a high strikeout rate. For example, there have been only five players in major league history who have had a strikeout rate of at least 30 percent in two thousand or more career plate appearances.
21
All five hit for power, and four of the five walked at a high rate, so, for the most part, each was doing all he could do to compensate for his high strikeout rate. But none managed a .250 batting average, only one managed an OBP above .333 (typically the league average is around here), and none managed an OPS above .815, which is about what you would expect from a power hitter. The point is that although we might not care about a hitter’s strikeout rate on its own, because the sum of all of his rates is one, and the strikeout rate is easy to predict, it provides an upper bound on the hitter’s potential production. Twenty times, a player has qualified for the batting title (by having sufficient plate appearances) in a season in which he struck out in 30 percent of his plate appearances, but in none of those seasons did that player hit .300. It has never happened because to do so would require an astronomical batting average on balls in play (BABIP).
In
Figure 7
, we depict the most common outcomes of a plate appearance (PA) in the form of a tree diagram, with the relative frequencies of events indicated by the area of the circle containing each event. If the circle representing strikeouts gets bigger, then at least one of the other circles
must
get smaller. For reasons that will become clear later on, we have chosen to first separate all outcomes into either balls in play (BIP), or balls not in play (BNIP), where a home run is considered not in play, since in almost all cases, no fielder has a chance to make a play on ball that is hit for a home run.
22
It is possible to write the formulas for batting average and on-base percentage almost entirely as a function of the four quantities shown in gray in
Figure 7
home runs, strikeouts, and walks per plate appearance (HR/PA, SO/PA, and WK/PA, respectively), and batting average on balls in play (BABIP).
23
Thus, in order to predict future values of batting average and OBP, one only needs to have estimates of those four quantities. Felicitously, we can see from
Table 3
that the three quantities for which the ball is not in play (HR, SO, WK) typically demonstrate high reliability, making them relatively easy to predict. Unfortunately, the fourth quantity (BABIP) has the lowest reliability of any of the batting statistics we listed, making it difficult to predict accurately. Moreover, because the ball is put into play about 70 percent of the time on average, batting average on balls in play makes up a disproportionate share of the outcomes, and thus has a tremendous influence on hitting production.
Figure 7. Tree Diagram Depicting the Outcomes of a Plate Appearance
The area of each node is proportional to the overall frequency with which it occurs, on average. The width of each edge is proportional to its relative frequency with respect to its parent event.
Note: WK is all walks and hit-by-pitches, accumulated since walks and hit-by-pitches have exactly the same effect. UBB are unintentional walks. HBIP is “hits on balls in play,” so the critical BABIP ratio is HBIP/BIP. The OUT circle simply captures all nonhits on balls in play, with GO and AO representing ground outs and air outs, respectively.
With this perspective, the major difficulty in predicting a hitter’s future performance is estimating his batting average on balls in play, since estimating the three other quantities can be done, for example, by simply taking a weighted average of the three most recent years, and incorporating the effects of player age and ballpark.
24
Due to its low reliability, this approach will not be very effective with BABIP. To the best of our knowledge, research that provides a significantly better way to predict BABIP either does not exist or has not been released to the public.
The idea that the reliability of a statistic affects its predictive value has been incorporated into many different models of hitting. It is often couched as
regression to the mean
, a longstanding statistical phenomenon that governs the behavior of random variables. Briefly, there is always a nonzero probability that random variables will take on unlikely values. But the likelihood of doing so repeatedly is, by definition, much smaller. Thus, unlikely observations appear to regress toward the expected value (i.e., the mean) upon repeated measurement in the future. For instance, if a player has an OBP of .370 in 2012 and the mean OBP is .333, then we would expect his OBP in 2013 to be somewhere between .333 and .370. Applications of regression to the mean in sports are copious, encompassing both the sophomore slump and the
Sports Illustrated
cover jinx, among other memes.
Regression to the mean is fundamental to virtually every well-known projection system, since it tends to reduce the error in future predictions. However, the question of
how
one should employ regression to the mean remains open. Typically, projection systems work by blending two relevant estimates of a hitter’s future performance: the estimate concocted for the player by the system; and the league average value of that statistic. The final estimate of the future statistic for that player will then be a weighted average of the estimate generated for him, and the league average. This averaging regresses the estimate for the player toward the mean. But by how much?
In Tom Tango’s Marcel projection system,
25
regression to the mean is incorporated via a function of how many plate appearances a player had in each of the previous three seasons. So to arrive at the estimate for Albert Pujols’s batting average in 2004, you would combine a weighted average of his batting average over the three previous seasons (about .336), with a weighted average of the batting average for all position players over that same time span (.268),
according to a weight (about 0.87) based on the number of plate appearances that Pujols had. The result (.328) reflects your belief in the observations you have made about Pujols’s hitting ability, but also your knowledge of the hitting ability of position players as a whole. These ad hoc notions can also be formalized and refined into the language of Bayesian statistics, where a prior belief about the population (i.e., hitting ability of all position players) is combined with observations about a specific player (i.e., Pujols) via techniques that have been proven to be optimal under certain assumptions.
26