The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball (27 page)

BOOK: The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
6.8Mb size Format: txt, pdf, ePub

61
. The coefficient of variation is the standard deviation divided by the mean. It adjusts for the increase in average payrolls over time.

62
. Frank and Murray are both quoted by Alan Schwarz in
The Numbers Game
, p. 131.

Chapter 2. The Growth and Application of Baseball Analytics Today

1
. Lewis,
Moneyball
, pp. 89–90.

2
. Lewis,
Moneyball
, p. 95.

3
. S. T. Jensen, K. E. Shirley, and A. J. Wyner, “Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball,”
Annals of Applied Statistics
, 3, no. 2 (2009), pp. 491–520.

4
. Of course, in those cases where a former baseball operations person is the team president, such as with the Cubs and Marlins, it is possible to think of this position as the third, instead of the second, post.

5
. Scott Sherman, “Rethinking America’s Pastime: The Paul DePodesta Story,”
Harvard Crimson
, May 5, 2012.

6
. Also not included in our count are consultants, because these relationships are nearly impossible to identify and unravel. However, such arrangements certainly exist. In early 2013, prominent sabermetrician Tom Tango signed an exclusive consulting services contract with Theo Epstein and the Chicago Cubs. (See Jon Greenberg, “Q&A: New Cubs ‘Saberist’ Tom Tango,”
ESPNChicago.com
, January 30, 2013,
http://espn.go.com/blog/chicago/cubs/post/_/id/14619/qa-new-cubs-saberist-tom-tango
). But since we can’t know the extent and depth of Tango’s contribution, we felt it best to exclude all consultants.

7
. Jim Duquette, personal communication via email, April 17, 2013.

8
. Phillies GM Ruben Amaro observed: “Since I’ve been here, we don’t have an inhouse stats guy and I kind of feel we never will. We’re not a statistics-driven organization
by any means.” Doug Miller, “New Defensive Stats Starting to Catch On,”
MLB.com
, January 11, 2010. Amaro publicly reconsidered his antithetical stance toward sabermetrics after the team’s poor 2013 season, stating, “We may be looking to fortify some of our information with some more statistical analysis. We have to look at the way we do things and try to improve. . . . I’m not so stubborn that we can’t try to do things a little bit different.” Tyler Kepner, “Fresh Leadership for Stale Phillies,”
New York Times
, August 26, 2013.

9
. Personal communications.

10
. For example, Karl Mueller’s job title with the Brewers is director, video scouting and baseball research.

11
. For example, Chris Long of the San Diego Padres has an M.A. in mathematics from Rutgers University, and was ABD (all but dissertation) in the Ph.D. program in statistics.

12
. Keith Woolner of the Cleveland Indians has a M.S. in decision analysis from Stanford University.

13
. Formerly, Jim Cassandro of the Arizona Diamondbacks would have qualified. One of the present authors, Ben Baumer, briefly would have been in this category with the New York Mets. Daniel Mack earned a Ph.D. in computer science before joining the Royals in 2013.

14
. Lewis,
Moneyball
, p. 99.

15
. While the book by Jonah Keri,
The Extra 2%: How Wall Street Strategies Took a Major League Baseball Team from Worst to First
(New York: Ballantine Books, 2011) appropriately calls attention to the innovative nature of the Tampa Bay Rays management, it misrepresents the substance of the Rays’ management practices.

16
. Lewis,
Moneyball
, p. 18.

17
. Lewis,
Moneyball
, p. 85.

18
. Lewis,
Moneyball
, p. 122.

19
. Lewis,
Moneyball
, pp. 96, 99.

20
. Tyler Kepner, “Astros’ Luhnow Took Short Walk to New Job,”
New York Times
, Dec. 17, 2011.

21
. See
http://www.hardballtimes.com/main/article/interview-carlos-gomez-mlb-scout/.

22
. See
http://www.7dvt.com/2011red-sox-baseball-scout-galen-carr
.

23
. Lewis,
Moneyball
, p. 128.

24
. The very notion that player evaluation ever could be separated into two distinct boxes—objective and subjective—is fraught with irony. Consider Lewis’s comment on p. 16 of
Moneyball
: “High school pitchers . . . were able to generate the one asset that scouts could measure: a fastball’s velocity.” Thus, it is implied that scouts base their evaluations largely upon a quality that can be objectively measured.

25
. Lewis,
Moneyball
, p. 241.

26
. The answer is exactly 100 players. Mark McGwire, with 583 career home runs but only 252 doubles, has the largest discrepancy.

27
. He hit .386 on odd numbered days and .323 on even. This example is of course of no practical significance, but it illustrates the level of detail in the data.

28
. Lewis,
Moneyball
, p. 88.

29
. To be clear, Friedman worked for the Rays for two years before becoming their GM. During this time Sternberg had a noncontrolling interest in the team.

30
. Nate Silver,
The Signal and the Noise
(New York: Penguin Press, 2012), p. 107.

31
. Antonetti said as much during “Covering the Bases—An Evening with Our GMs,” Mark H. McCormack Department of Sport Management at the Isenberg School of Management, UMass-Amherst, November 13, 2012.

32
. For comparison, the batting table of the LahmanDB contains about 100,000 rows, so while you can’t store it in Excel 2003, you can store it in Excel 2007. On the other hand, the Retrosheet events table has almost 10 million rows, rendering Excel useless.

33
. Bloomberg Sports claims to have relationships with twenty-five of the thirty MLB clubs, eight to twelve of which are “enterprise clients.” While exact figures are not publicly known, our information suggests that these clubs pay in the neighborhood of $100,000 per year for Bloomberg to handle their entire baseball information operation.

34
. James has played a complicated role in this evolution. In
Moneyball
, he discusses the impetus for Project Scoresheet: ‘The lack of critical data means that “we as analysts of the game are blocked off from the basic source of information which we need to undertake an incalculable variety of investigative studies” ’ (p. 84). Yet James himself joined STATS, Inc. and has not played an active role in Retrosheet.

35
. Chris Jaffe, “Interview: Dave ‘Retrosheet’ Smith,” The Hardball Times, September 5, 2007.
http://www.hardballtimes.com/main/article/interview-dave-retrosheet-smith/
.

36
. Unlike PITCHf/x, which records this data objectively through the use of high-speed cameras, STATS and BIS compile their pitch-by-pitch data from (usually multiple) human observers, who are either in the ballpark or watching the game on TV.

37
. The thirty clubs nominally pay a uniform fee to MLBAM each year. However, in practice MLBAM generates several hundred millions of dollars of profit yearly, a healthy share of which is distributed to the clubs.

38
. Paul DePodesta, “Mets Exec: More Data Doesn’t Mean Better Data,” CNBC, March 28, 2013.
http://www.cnbc.com/id/100597953
.

Chapter 3. An Overview of Current Sabermetric Thought I

1
. Lewis,
Moneyball
, p. 124.

2
. All original figures were created using the mosaic package for R. See Randall Pruim, Daniel Kaplan, and Nicholas Horton (2012). mosaic: Project MOSAIC (
mosaic-web.org
) Statistics and Mathematics Teaching Utilities. R package version 0.6-2.
http://CRAN.R-project.org/package=mosaic
. More specifically, all figures use the lattice package for graphics (Deepayan Sarkar, 2008, Lattice: Multivariate Data Visualization with R. Springer, New York) and, of course, R itself (R Core Team (2013). R: A Language
and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria,
http://www.R-project.org/
).

3
. Strangely, NBA Houston Rockets’ GM Daryl Morey is apparently responsible for both estimates. While James’s model first appeared in the
1980 Baseball Abstract
, Morey estimated the value of the exponent in many other sports. See Aaron Schatz, “BackTalk: Keeping Score; Follow the Points to Find a Super Bowl Champ,”
New York Times
, January 23, 2005. Morey’s work on basketball showed an exponent of 13.91 and was published in
STATS Basketball Scoreboard, 1993–1994
, STATS, Inc., October 1993, p. 17.

4
. Steven Miller, “A Derivation of the Pythagorean Won-Loss Formula in Baseball,”
Chance Magazine
20, no. 1 (2007), 40–48: 9698. The distribution in question is the Weibull distribution.

5
. In 2002, while the A’s did struggle in May, they had righted the ship by mid-June, reaching a winning percentage of .550 on May 18 and never looked back. They exceeded their expected number of wins by 3 to 6 from June 7 on.

6
. Dan Fox, “Circle the Wagons: Running the Bases Part III,”
Hardball Times
http://www.hardballtimes.com/main/article/circle-the-wagons-running-the-bases-part-iii/
, 2005.

7
. James Click, “Station to Station: The Expensive Art of Baserunning,” in
Baseball Prospectus 2005
, (New York: Workman Publishing, 2005), pp. 511–519.

8
. Ben Baumer and Peter Terlecky, “Improved Estimates for the Impact of Baserunning in Baseball,”
JSM Proceedings
, Statistics in Sports Section, 2010.

9
. See, for instance, Hirsch and Hirsch,
The Beauty of Short Hops
, passim.

10
. The 67 percent is known as the R
2
or the coefficient of determination.

11
. Lewis,
Moneyball
, p. 128.

12
. Jim Furtado, “Introducing XR,” Baseball Think Factory, 1999.
http://www.base ballthinkfactory.org/btf/scholars/furtado/articles/IntroducingXR.htm
.

13
. There are obvious limitations to linear weights formulas, but they have proven to be effective, especially when applied to large samples. The oft-mentioned Weighted On-Base Average (wOBA) is a linear weights measure scaled to conform to typical values of OBP. For a comparison of these methods and their effectiveness, see Albert and Bennett’s
Curveball
, p. 230, or Colin Wyers, “The Great Run Estimator Shootout,”
Hardball Times
, April 16, 2009.

14
. Lewis,
Moneyball
, p. 128.

15
. For a nice primer on this issue, see Alan Schwarz, “New Baseball Statistic, With a Nod to an Old Standard,”
New York Times
, February 25, 2007. Schwarz cites Victor Wang, “The OBP/SLG Ratio: What Does History Say?”
By the Numbers
, August 2008. Wang arrived at his estimate of 1.8 via trail-and-error, by finding the value
x
such that
x
* OBP + SLG produced the best fit to runs scored. A mathematically precise way at estimating the value of
x
is to take the ratio of the coefficients of OBP and SLG from a
multiple regression model. This confirms that the optimal choice of
x
is about 1.84. A common misinterpretation of this result is that OBP is 1.8 times as “important” as SLG. This is not accurate, since OBP and SLG have different scales. However, we can arrive at an answer to this question by taking natural logarithms on both sides of the regression model. The ratio of the coefficients in this model is 1.5, suggesting that returns to percentage increases in OBP are 50 percent higher than returns to percentage increases in SLG.

16
. Lewis,
Moneyball
, p. 128.

17
. Lewis,
Moneyball
, p. 18.

18
. Silver,
The Signal and the Noise
.

19
. An average player’s performance tends to improve up to approximately twenty-nine years of age, to level off for three years and then to begin a slow decline. Thus, we would not expect the coordinates of the dots to be identical even if batting average measured pure skill; rather, we would expect them to be close. For each player, the dots may display a temporal pattern, but this pattern would not appear in a chart of all players unless there were demographic shifts in the playing population over time.

20
. A common, mathematically sensible way to model a batter’s hitting ability is with a multinomial distribution. That is, identify a finite number of outcomes of a plate appearance (e.g., single, double, triple, home run, walk, hit-by-pitch, strikeout, ground out, fly out, and a catch-all category for all other outcomes) and then model each batter as having a fixed probability of ending his plate appearance with each of these outcomes. This defines a multinomial distribution, and each batter’s hitting ability can be described by a vector of probabilities that sum to 1. For example, Prince Fielder singles in 13 percent of his plate appearances, doubles in 5 percent, homers in 6 percent, etc. See, for example, Brad Null, “Modeling Baseball Player Ability with a Nested Dirichlet Distribution,”
Journal of Quantitative Analysis in Sports
5, no. 2 (2009).

21
. In descending order of career strikeout rate up through the 2011 season: Mark Reynolds (33.2 percent), Russell Branyan (32.9 percent), Bo Jackson (32.0 percent), Jack Cust (31.7 percent), and Rob Deer (31.2 percent). Lest we give striking out too bad a name, it does negate a potential negative—hitting into a double (or triple) play.

22
. We have used other nonstandard acronyms. WK is all walks and hit-by-pitches combined, since walks and hit-by-pitches have exactly the same effect. UBB are unintentional walks. HBIP is “hits on balls in play,” so the critical BABIP ratio is HBIP/BIP. The OUT circle simply captures all nonhits on balls in play, with GO and AO representing ground outs and air outs respectively.

Other books

Black Run by Antonio Manzini
Blair’s Nightmare by Zilpha Keatley Snyder
Sleep No More by Greg Iles
Dream of Me by Delilah Devlin
The Log Goblin by Brian Staveley