A Promising New Run Estimator: Base Runs
Brandon Heipp
Runs Created and Linear Weights are two examples of run estimators, statistics that attempt to derive runs scored from
the components of a batting line. Here, the author summarizes a new statistic
developed by David Smyth, which is potentially more accurate than others.
Run estimators are a topic that is extremely common in sabermetrics,
much to the annoyance of some who would like to get past the issue and move on to something else. Personally, I am fascinated by them. Anyway, most new run
estimators proposed are simply linear weights formulas with slightly different coefficients than previous method. However, some recent work by David Smyth appears to have promise and be an alternative to preexisting measures.
Smyth wrote an article on his Base Runs method which is available
at http://www.baseballstuff.com/fraser/articles/basenew.html about
a year ago. He has recently updated his work on the Strategy and Sabermetrics
forum at FanHome.com (www.fanhome.com).
Base Runs is a team context formula, like Runs Created. It tries to measure the interactivity between offensive events, not just the simple linear values. The basis for BsR is that runs scored equals home runs plus the number of baserunners
times the percentage of baserunners who score. As an equation, this is:
A*B/(B+C) + D
A represents baserunners, B is advancement, C is outs, and D is
Home Runs.
B/(B+C) is the estimated percentage of baserunners who score,
an estimator that Smyth found gives good empirical results.
Smyth gives these equations for the factors:
A =
H + W + HB - CS- HR
B = (2.5*TB - H - 5*HR + 2*SB + .05*(W+HB))*X
C =
AB - H + SF
Where X is set so that Base Runs equals Runs for whatever unit
is being tested(team, league, etc.), and is historically around .535. A version
that I have rigged up is useful in that it looks at just the basic stats(AB, H, TB, W):
A = H+W-HR
B = (2*TB - H - 4*HR + .05*W)*X
C = AB-H
X is historically around .78
The most intriguing part of Base Runs is that Smyth claims that
it is more accurate at the extremes than other run creation methods. Here are
estimates for a few extreme teams using the basic versions of Base Runs, Extrapolated Runs, and Runs Created. Team A makes 500 outs, no baserunners. Team B hits 500 HR,
no outs. Team C draws 500 walks, no outs:
Team
BsR
XR
RC
A
0
-48
0
B
500
720
2000
C
500
170
0
In each case, BsR gives a more reasonable estimate than either
XR or RC. But to be a useful formula, it must predict runs with similar accuracy
in normal conditions. Here are the Root Mean Square Errors for each of the formula,
predicting team runs in the period 1980-2000:
Method
RMSE
XR
23.7
BsR
23.9
RC 25.8
The accuracy of Base Runs is comparable to that of Extrapolated
Runs, and superior to that of Runs Created, at least in this sample.
One final note about Base Runs is that it is a team method, like
Runs Created, and can cause distortion when applied to players. To properly evaluate
players by Base Runs, a theoretical team mechanism like the one Bill James uses in his New RC is necessary.
If you are interested, please check out Smyths work at the aforementioned
sites. Thanks to him for his research and his willingness to share it with all.