Base Runs: A Promising New Run Estimator

A Promising New Run Estimator: Base Runs

Brandon Heipp

Runs Created and Linear Weights are two examples of run estimators, statistics that attempt to derive runs scored from the components of a batting line. Here, the author summarizes a new statistic developed by David Smyth, which is potentially more accurate than others.

Run estimators are a topic that is extremely common in sabermetrics, much to the annoyance of some who would like to get past the issue and move on to something else. Personally, I am fascinated by them. Anyway, most new run estimators proposed are simply linear weights formulas with slightly different coefficients than previous method. However, some recent work by David Smyth appears to have promise and be an alternative to preexisting measures.

Smyth wrote an article on his Base Runs method which is available at http://www.baseballstuff.com/fraser/articles/basenew.html about a year ago. He has recently updated his work on the Strategy and Sabermetrics forum at FanHome.com (www.fanhome.com).

Base Runs is a team context formula, like Runs Created. It tries to measure the interactivity between offensive events, not just the simple linear values. The basis for BsR is that runs scored equals home runs plus the number of baserunners times the percentage of baserunners who score. As an equation, this is:

A*B/(B+C) + D

A represents baserunners, B is advancement, C is outs, and D is Home Runs.

B/(B+C) is the estimated percentage of baserunners who score, an estimator that Smyth found gives good empirical results.

Smyth gives these equations for the factors:

A = H + W + HB - CS- HR

B = (2.5*TB - H - 5*HR + 2*SB + .05*(W+HB))*X

C = AB - H + SF

Where X is set so that Base Runs equals Runs for whatever unit is being tested(team, league, etc.), and is historically around .535. A version that I have rigged up is useful in that it looks at just the basic stats(AB, H, TB, W):

A = H+W-HR

B = (2*TB - H - 4*HR + .05*W)*X

C = AB-H

X is historically around .78

The most intriguing part of Base Runs is that Smyth claims that it is more accurate at the extremes than other run creation methods. Here are estimates for a few extreme teams using the basic versions of Base Runs, Extrapolated Runs, and Runs Created. Team A makes 500 outs, no baserunners. Team B hits 500 HR, no outs. Team C draws 500 walks, no outs:

Team BsR XR RC

A 0 -48 0

B 500 720 2000

C 500 170 0

In each case, BsR gives a more reasonable estimate than either XR or RC. But to be a useful formula, it must predict runs with similar accuracy in normal conditions. Here are the Root Mean Square Errors for each of the formula, predicting team runs in the period 1980-2000:

Method RMSE

XR 23.7

BsR 23.9

RC 25.8

The accuracy of Base Runs is comparable to that of Extrapolated Runs, and superior to that of Runs Created, at least in this sample.

One final note about Base Runs is that it is a team method, like Runs Created, and can cause distortion when applied to players. To properly evaluate players by Base Runs, a theoretical team mechanism like the one Bill James uses in his New RC is necessary.

If you are interested, please check out Smyths work at the aforementioned sites. Thanks to him for his research and his willingness to share it with all.