Read The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball Online
Authors: Benjamin Baumer,Andrew Zimbalist
The current 2012–2016 CBA solidifies the incentive gains achieved in the previous CBA and introduces some new mechanisms that should further promote balance. Among the new mechanisms, the Rule 4 amateur draft will now be subject to a tax and cap plan, international signings will also be subject to a cap, and there will be a competitive balance lottery giving extra draft picks to low finishing teams. The cap and tax plan works like this. The team with the lowest win percentage the prior year is allocated the highest cap, while the team with the highest win percentage is given the lowest cap.
13
For 2012, the per team caps run from $4.47 million to $11.49 million. If a team exceeds its limit, it will be subjected to a tax of between 75 and 100 percent and face the possible loss of future picks. These changes are all explicitly designed to give the low revenue teams more picks and to maximize the possibility
that they will be able to sign the picks they make. Baseball, of course, has also added a second wild card team in each league, commencing in 2012, which will make for more exciting pennant races and provide for more opportunities for postseason appearances.
The foregoing discussion has omitted several elements that are part of the context of MLB’s economic system and, instead, has focused on the role of incentives in the design of an effective system to promote competitive balance. Economic theory teaches that incentives matter and MLB has provided a laboratory over the past fifteen years that corroborates this concept. Of course, designing a system and implementing it are two different matters; the latter always confronts the reality of political divisions and constraints. In this regard, MLB is no different than any other sports league or decision-making body.
Sabermetricians measure performance—mostly performance on the field, but also in the dugout and in the front office. They seek to inform us through new metrics and analysis what produces wins and profits.
In this chapter, we turn the tables by endeavoring to measure the output of sabermetricians. In
Chapter 1
, we expressed skepticism about the story told by Michael Lewis in
Moneyball
, or at least about the details of that story. Lewis may have missed a few basic points and misrepresented several others, but that doesn’t mean that the underlying message was wrong. If
Moneyball
was nothing more than an intriguing fable, it is unlikely that sabermetrics would have spread like wildfire throughout team front offices, as it did in the ensuing years, and as we documented in
Chapter 2
.
Certain sabermetric insights, whether they originated in the work of F. C. Lane, Allan Roth, George Lindsay, Earnshaw Cook, Pete Palmer, Bill James, Vörös McCracken, Tom Tippett, or others, are of indisputable value. There is, for instance, no rational reason to think that batting average is a greater contributor to wins than on-base percentage, that fielding percentage is more important than DER, or that ERA is a more meaningful indicator of future pitching prowess than FIP.
Indeed, consider the following statistical evidence. One way to parse the relative importance of the saber-inspired metrics of OBP, DER, and FIP versus the traditional metrics of BA, FPCT, and ERA is to run a regression of win percentage on each set of three variables. We did this for the years 1985 through 2011 and the results were clear. For model (1) below, the coefficient
of determination, or R
2
, is 0.80. That is, 80 percent of the variance of win percentage is explained by the variance of OBP, DER and FIP. Conversely, the R
2
for model (2) is 0.69.
(1) WPCT = f(OBP, DER, FIP)
(2) WPCT = f(BA, FPCT, ERA)
Thus, the saber-inspired metrics explain 11 percent more of the variance in win percentage than do the conventional metrics. Other things being equal, this suggests that a GM making use of saber-inspired metrics will have an advantage of 11 percentage points toward putting together a winning team, as opposed to a non-saber-inspired GM.
But this is easier said than done, as there are many obstacles to move from saber-inspired metrics to building a better team. First, sabermetrics itself is a moving target, since the theory and practice have evolved considerably over the past twenty years. On many issues, there is no consensus as to what the correct sabermetric interpretation even is. For example, the debate about the value of stolen bases continues to this day. Most people who have studied the issue have found that the cost of the out lost by getting caught stealing is roughly two times as large as the value gained by swiping a base. This insight led many teams, most notably the A’s and Red Sox, who were purported adherents of sabermetrics, to become extremely conservative on the basepaths. In fact, among the last twenty team-seasons with the fewest stolen bases, half belong to the A’s and Red Sox, with no other franchise appearing more than once on the list.
However, suppose that you were Tampa Bay, and that you already had Carl Crawford on your team. You knew that Crawford stole bases with a historically high success rate (approaching 90 percent), and so you were convinced that by any reasonable accounting, his baserunning would make a positive contribution to your offense. Should you discourage him from stealing bases? Of course not. Thus, while the prevalence of stolen bases is not a perfect metric for estimating sabermetric intensity, it does produce several notable true positives in this context.
Second, sometimes the metric in question is easy to identify at the team
level, but difficult to identify at the player level. For instance, DER, or defensive efficiency rating,
1
is roughly equal to 1 – BABIP, where BABIP is batting average for balls is play. Hence, DER measures the ability of the team in the field to convert a ball put in play (i.e., not a home run, not a walk, and not a strike out) into an out. This measure is broader than the conventional fielding percentage, because the latter just measures whether fielders can convert a ball hit in their range (whatever it may be) into an out, whereas DER also encompasses the fielders’ ability to get to the ball (their positioning, jump, speed, balance, etc.) and, if necessary, throw it in a timely and accurate manner to the relevant base. One issue is that DER is not a pure measure of fielding prowess because part of it measures luck (whether balls happen to be hit right at a player or take a bad hop), part of it measures whether the fielders were positioned properly by the coaches, part of it measures whether the pitcher threw the pitch to the part of the plate that was signaled, and part measures the opposing team’s hitting (how hard and at what trajectory a ball was hit).
While measuring DER for a team is straightforward, it cannot be observed directly for an individual player. In order to capture an individual fielder’s defensive prowess, we have to know his sure-handedness, the accuracy and strength of his arm, his instinct for positioning, the jump he gets on a hit ball, the speed and balance with which he runs to the ball, the velocity, trajectory, and spin of the hit ball, and so on. While there have been forays into quantifying these factors for individual players, such as UZR (as we discussed in
Chapter 4
), these measures are still in the early stages of development, display erratic results, and are often proprietary (meaning that the details of the computation are unknown to the general public, making it difficult, if not impossible, to assess the validity of the underlying methodology). Thus, while the use of DER gives us good predictive powers at the team level, a GM would have a formidable task in discerning how to select individual players based on this concept.
Nonetheless, although the precise quantification of an individual player’s fielding skill may be elusive statistically, the recognition of the importance of DER in team success is significant in itself. Even without a new, reliable metric, teams can train scouts and analysts to more closely track a player’s
defensive skill set, either through live or video observation, and, thereby, improve its capacity to build a defensively strong team. As we will discover, it appears that the Tampa Bay Rays have done precisely this and that the team’s remarkable success since 2008 owes much to this development.
Third, another link between identifying more useful metrics and engendering strong team performance resides in a team’s development system and its coaches. Players’ skill sets and proclivities are not static; they change and develop over time. A GM might make what appears to be an optimal player acquisition, but unless that player is put in the right environment and developed appropriately, the payoff for the team might be small.
Fourth, the next conundrum for the saber-savvy GM is that other teams may have identified the same new metrics, raising the market demand for the associated abilities and, hence, the price for the new player skills. If the new metric is 10 percent more closely associated with wins, but the price of the related skill rises 10 percent, then the saber-savvy GM may be no better off. We will consider this potential dynamic in more detail shortly.
Fifth, suppose our insightful GM does everything right and identifies an ideal draft pick, free agent, or player to be acquired in a trade. The draft pick might go to another team with an earlier pick, the free agent might want to play in a different city or accept a higher offer, and the prospective trade might not be consummated for any number of reasons. Nothing is gained.
Sixth, a GM may receive good advice from a metrician, but choose to ignore it.
Last, an acquired player may become injured or feel less comfortable with the players and coaches on the new team, and, thus, be less productive than anticipated. Baseball is a game infused with chance and uncertainty . . . and may it stay that way.
For any one of the above reasons, the nexus between a sabermetric insight and greater team success can be disrupted. Accordingly, measuring the output (or value) of sabermetricians is not an easy matter. Nevertheless, since sabermetricians purport to measure the contribution of others, it makes sense, despite the inherent complications, to attempt to measure the contribution of sabermetricians.
Hence, after some ten years of experience with self-conscious sabermetrics
in baseball’s front offices, we ask: What’s the record? Has the adoption of sabermetrics paid off for teams? To answer the questions, we basically follow a two-step analysis. First, we attempt to identify the intensity of sabermetric practice (saber-intensity) across baseball’s thirty teams. Second, we see how this intensity is correlated with team performance, after controlling for team payroll.
Identifying saber-intensity might sound more straightforward than it is. If we try to measure saber-intensity by counting front office personnel doing sabermetric analysis, we encounter various obstacles. The first is that team organizational charts (or even media guides) are not always easy to obtain, not always complete and not always clear. As indicated in
Chapter 2
, people doing sabermetric analysis don’t always have “sabermetric” titles. The second is that in many instances the role of scout and the role of sabermetrician are less and less distinct—increasingly the scout uses new statistical metrics and the sabermetrician employs video to analyze player performance. The third is that what happens in a baseball operations statistical office does not always connect with what happens in the GM’s office or what happens on the field.
Another confounding factor is that sometimes decisions have all the earmarks of being sabermetrically inspired, but are in reality the result of nothing more than gut instinct, idiosyncrasy, or luck. Take, for instance, the signing of free agent David Ortiz by the Red Sox in 2003. The new ownership at the time already had a strong reputation for being sabermetrically oriented. Principal owner John Henry had attempted to lure Billy Beane away from the Oakland A’s, had signed Bill James to be the lead statistical analyst for the Red Sox, and had hired the 28-year-old, statistically oriented Yale graduate Theo Epstein to be the team’s GM. Henry’s own background in finance involved developing statistical models to track commodities and currency prices.
By 2004, the new Red Sox ownership had assembled enough talent to win the team’s first World Series since 1918. The conventional wisdom is that the Red Sox success was another feather in the cap of sabermetrics. Ortiz’s last three years with the Minnesota Twins showed a mediocre batting average
of .265, but he demonstrated both power (isolated power of .205) and plate discipline (he walked in nearly 11 percent of his plate appearances).
2
He was a sabermetric diamond in the rough.
Pre-eminent baseball writer and television commentator Tom Verducci told Andrew Zimbalist that the Red Sox were a clear illustration of the success of sabermetrics and that the signings of Bill Mueller and David Ortiz were prime examples of the method at work. Verducci is not alone in this assessment, and there is little question that sabermetrics had a hand in the assembling of the Red Sox 2004 roster (though a good deal of credit has to go to the team’s GM from 1994 through March 2002, Dan Duquette).
3