Base Runs
The link below is for an article that was printed in the August 2001 edition of By the Numbers, the newsletter
of SABR's Statistical Analysis Committee, with minor modifications, about David Smyth's outstanding Base Runs method.
Below the link is some additional information about BsR.
Breaking Down BsR
It is sometimes useful to write a stat like Base Runs in rate form. It helps
greatly in making the Theoretical Team equations, for one thing, and it is also useful to be able to write BsR completely
in terms of BA, OBA, SLG, and HR/PA. To do this, you need to start with each component and divide it by PA. So,
A/PA, B/PA, C/PA, and D/PA. (Since I am using a basic version of Base Runs, you need PA=AB+W). You can call these,
resepectively, Runners On Base Average(ROBA), Advancement Factor(AF), 1-OBA, and HR/PA. Then
BsR/PA = ROBA*AF/(AF+1-OBA)+HR/PA
For the Basic version I use, these are the equations for each component:
ROBA = (H+W-HR)/(AB+W) = OBA-HR/PA
AF = (2*TB-H-4*HR+.05*W)*.78 = ((2*SLG-BA)*(1-OBA)/(1-BA)-4*HR/PA+.05*(OBA-BA)/(1-BA))*.78
1-OBA = 1-(H+W)/(AB+W)
HR/PA = HR/(AB+W)
In the Base Runs article linked above, I gave the equations that I use for each factor in this basic version. The
B multiplier is based on the composite MLB stats of 1946-1995. In this period, the average for each components are:
ROBA AF
OBA HR/PA
.303 .308
.325 .0222
You can use these to put together the Theoretical Team factors. The TT concept, which I will not explain here in
every detail, is that since Base Runs(or Runs Created) is a run estimator devised for estimating team runs, there is an interactivity
between the values of the offensive events. As the offensive production increases, the value of each event goes up(with
the exception of the special case, HR). So applying BsR to Babe Ruth gives him an unfair advantage because he is not
playing on a team by himself; he is playing on a team with 8 other players. So the TT formula puts the the player on
a team with 8 average players. So, we assume that each player on the theroetical team gets the same number of PA as
our player. So the teams new A factor can be calculated as (A+LgROBA*PA*8), where A is the individual's A factor. So
you apply this technique to the B, C, and D terms, using the long term averages above(you really should have a seperate version
each year, but small changes in ROBA, AF, etc. don't significantly change the results of the formula).
Then, to see how much the player has helped this team, we compare him to a team of 8 average players in his number of
PA each. If we wanted to compare the player to the league average, we would compare him to 9 average players. If you
work all this out and simplify, you get this equation for TT BsR, which I like to call Individual Base Runs(IBR).
IBR = (A+2.42PA)(B+2.46PA)/(B+C+7.86PA)+HR-.76PA
Lest it seem as if I am taking credit for coming up with all of this, the pioneering TT work was done by Dave Tate and
Bill James, and the application of the TT concept to BsR was also the work of David Smyth.
Stolen Base BsR
It is useful and necassary to get some more categories into a Runs Created formula, and so here we'll put SB and CS in(this
is again based on Smyth's work). The other categories we could add, like SF, SH, and DP, I choose to ignore. For
one, they are very situation dependent and therefore I'm not 100% comfortable in including in an individual formula, and secondly
and more importantly, I am lazy and don't want to deal with them. Anyway, for BsR including SB:
A = H + W - HR - CS
B = (2*TB - H - 4*HR + .05*W +1.5*SB)*.76
C = AB-H
The IBR formula for the standard league is:
IBR =(A+2.34PA)(B+2.58PA)/(B+C+7.98PA)+HR-.76PA
ROBA and AF are no longer the rate stats; I call these AROBA and AAF for "advanced". Anyway, the long term averages
are:
AROBA AAF
OBA HR/PA
.293 .323 .325
.0222
Full BsR
Here is a version of the BsR formula that you can use if you have all of the minor(SH, SF, DP, etc.) offensive stats. It
is not as clean and nice looking as the other versions on this page, but there needs to be more of a give-and-take between
the various events when you include the other stats. It is also not straightforward as to which events should be placed in
which factor(s). I took the convention that A is final baserunners; baserunners less those who we know have been thrown out
on the bases or taken out on a DP. Everything goes in B to balance everything out and produce good linear weights, while
C is batting outs. D remains home runs. There are other ways to define these terms and Smyth, TangoTiger, and Robert Dudek
have all done these in different ways then I have. There are certainly arguments to be made for all of the differnt approaches,
but a discussion of that will have to wait for another day.
A = H + W + HB - HR - CS - DP
B = .777S + 2.61D + 4.29T + 2.43HR + .03(W + HB - IW) - .747IW + 1.30SB + .13CS + 1.08SH + 1.81SF + .70DP -.04(AB-H)
C = AB - H + SH + SF
If you want to include strikeouts, they go in this B factor which is coupled with the A and C factors given above:
B = .781S + 2.61D + 4.28T + 2.42HR + .034(W + HB - IW) - .741IW + 1.29SB + .125CS + 1.07SH + 1.81SF +.69DP - .029(AB-H-K)
- .086K
Finding the B Multiplier
The B multiplier is designed so that the BsR formula will produce the correct number of runs for the entity you are using.
This is because A as baserunners, C as outs, and D as Home Runs, all are straightforward and obvious formulas.
You can calculate, based on A, C, and D, the actual B factor required to equate BsR with R, by this formula: (R-D)*C/(A-R+D).
What can you do with the actual B value? For one thing, if you already have a set formula for B(ignoring the multiplier),
you can divide actual B by estimated B to get the correct multiplier. Another thing you can do is run a regression to
find weights for TB, H, etc. by using those stats to predict Actual B, or use other approaches like trial and error, etc.
All of these approaches had a role in finding the B component used in the official versions of BsR.
An alternate way to find B is to calculate Z=(R-D)/A, then B=Z*C/(1-Z). It is the same thing, and longer and more
complicated, but it is equivalent. (I include it because it was the way I did it until I took the time to work out the
algebra to derive the other formula).
Building the TT BsR Formula
Here are the technical steps to be building the TT formula. These are not very interesting for most people, but
hard core sabermetricians may find them useful(although hard core sabermetricians probably already know how to do it themselves):
IBR can be written as:
(A+X*PA)(B+Y*PA)/((B+Y*PA)+(C+Z*PA))+HR+T(PA)-(V)PA which simplifies too:
(A+X*PA)(B+Y*PA)/(B+C+(Y+Z)PA)+HR-(V-T)PA
where X is the remainder of team ROBA
Y is the remainder of team AF
Z is the remainder of team 1-OBA
T is the remainder of team HRPA
V is the R/PA for the comparison lineup multiplied
by the number of players
in the comparison lineup
OK, since we always add the player to a team with 8 average playes:
X = LgROBA*8 Y = LgAF*8 Z=(1-LgOBA)*8 T
= LgHR/PA*8
Depending on what baseline we use though, V will vary. For absolute runs, we compare the player to a team with
8 average hitters. For runs above average, we compare the player to a team with 9 average hitters. For runs above
replacement, we compare the player to a team with 8 average hitters plus one replacement level hitter. So, it is very
straightforward to find V for absolute: 8*LgBsR/PA. For average, V = 9*LgBsR/PA.
For replacment, we need to first set a replacement level, and then determine what ROBA, AF, OBA, and HRPA a replacement
player will have. I assume 25 batting outs(AB-H)/G, and use BsR/PA to calculate the R/G for the league. (BsR/PA)/(1-OBA)*25,
since BsR/O = (BsR/PA)/(1-OBA). (Keeping in mind that BsR/PA = ROBA*AF/(AF+1-OBA) + HRPA). Then, I assume
the replacement rate is 1 run/game below average, so I take that R/G, subtract 1, and divide by 25. This is the replacement
player's R/O. In the standard league we are using, the BsR/PA = .117, R/O = .173, and RepR/O(R/O for the replacement)
= .133. Then we need to find the value, X, by which the each component stat for the league(ROBA, AF, OBA, and HRPA)
needs to be deflated by for R/O to equal .133. We multiply each term in the BsR/O formula by X. This, when
simplified, gives this equation:
RAX^2/(1+X(A-O)+HX)/(1-OX) = Rep R/O
R is LgROBA, A is LgAF, O is LgOBA, and H is LgHRPA. I have no idea how to solve for X by hand, but my TI-83 calculator
will do it, and it gives .89 for the standard league(this will all vary based on the league offensive levels, and of course
how you personally choose to define replacement rate). Any way, we then multiply each component by .89 to find we expect
our replacement to hit:
ROBA AF
OBA HRPA
.269 .274
.289 .02
So this gives him a BsR/PA of .095. We then calculate the V value for the replacement baseline as 8*LgBsR/PA+RepBsR/PA.
Here is a chart showing the values you need to fill in for the TT components at each baseline in the standard league:
BASELINE X
Y Z T V
V-T
Absolute 2.42 2.46
5.40 .178 .937
.759
Average " " "
" 1.054
.876
Replacement " "
" "
1.031 .853
If you want to get more complex, there is something that we have failed to adress. That is that if you really add
a player to a team, he will change the number of PA everyone in the lineup gets. A player with a higher OBA than his
teammates will generate more PA; one with a lower one will generate less. In the TT formula above, we have held PA constant.
What if we let them vary? We can calculate the OBA the team would have with the player as 8/9*LgOBA+1/9*OBA. Call
this Q. Then, figure (1-LgOBA)/(1-Q). Call this PAR of PA-added ratio. Then, multiply every individual term(the
new A, the new B, the new C, and the new D), by PAR, and proceed as usual.
Is this worth it? Who knows. Some of these bells and whistles might wash out when you convert them to win
values. Maybe they don't. A straight linear system, though, might be correct, and it will help you keep your sanity.
Simplified TT BsR
What does all that do for you? Believe it or not, it leaves you with almost the same result you would have gotten
if you took 1/9 of the player's straight BsR and 8/9 of the player's linear BsR(based on the reference team). This is
for the absolute version only, but of course if you have the absolute RC figure, you can calculate the values above other
baselines without going through all the voodoo above.
Fundamental Structure of BsR
The fundamental structure of BsR is its key asset. That fundamental structure is based on the simple, undeniable
truth that runs scored = baserunners*% of baserunners who score + home runs. "Basrunners" does not include home runs.
Anyway, in BsR, the A factor represents baserunners and the D factor represents home runs. The % of baserunners who
score, which we'll call score rate, is estimated as B/(B+C), where B is advancement and C is outs.
Other run estimators are not backed up by a fundamental theory of how runs are scored. Runs Created's downfall
is its failure to account for the unique nature of the HR(that it always produces at least one run, and if it occurs by itself,
it will produce only one run). Static LW formulas fail to account for the fact taht the value of each event varies based
on the context. BsR is based on a true equation of how runs are scored. That does not mean, though, that BsR is
the one true correct run estiamtor by any stretch. The equation of B/(B+C) to estimate score rate has good empirical
accuracy, but also has been found to not work very well in some circumstances(such as OBA between .500 and .800--see Tango's
article on Primer about this). Maybe score rate should be estimated in a totally different way. But the structure
of the BsR equation is sound. If we want a better run estimator, we need a better estimator of score rate.
Linear BsR
You can figure how a non-linear RC formula values each event in the context you are interested in(it can be the league,
a specific team, or even a hypothetical lineup of the same player over and over again). All you have to do is calculate
BsR for the entity, and then add one single, recompute BsR, and subtract the first figure. This is the value of one
additional single. Then you do the same with every other event, and you'll have LBsR. You have to be careful to
account for everywhere the event is involved; for example, a single not only adds a hit but also a Total Base and an At Bat.
If you run the LBsR for the long-term stats, you get these values:
LBsR = .48S+.81D+1.14T+1.50HR+.32W-.096(AB-H)
LBsR(sb) = .47S+.77D+1.07T+1.45HR+.33W+.23SB-.41CS-.093(AB-H)
Of course, you could add something other than one. You could subtract one, or add 10, or add 15. The further
you get away from 0, the more the results will vary. Adding 1000 singles will have a much different effect, even per
single, then adding 1 single. Really, as Tango has pointed out, we want to get as close to adding 0 singles as possible.
Adding .00001 singles changes the run enviornment and the values of the other events very little, and that is what we
are looking to do. It is sort of like a limit in calculus. Actually, I guess that's exactly what it is. We want
to find the limit of (new BsR minus old BsR) divided by X, as X approaches 0, where X is the number of the event that we are
adding. Somebody who knows a lot about calculus could probably tell me if I'm right about that, and if so, come up with
a formula to calculate the limit precisely instead of having to do trial and error in a spreadsheet.
I have included a spreadsheet which runs through this approach for the 1979 Pirates. You can change the data in
cells B2 to G2 to whatever you want to do this with other entities. Anyway, I show the LW generated by adding 10 of
each event, 1 of each event, .1 of each event, etc. and the same for -10, -1, -.1, etc. I have highlighted in pink the
positive and negative points at which the convergence, the limit, occurs. If you go past that(I put it at one ten-millionth,
10^-6), the values start fluctuating again. My suspicion is that this is because of the spreadsheet not having perfect
accuracy, internal rounding and the like, but I could be wrong. Anyway, you can see there is not a lot of difference.
The +10 weight for a Pirate single for instance is .4898824, the +1 is .4892998, and the limit is around .4892350. So
you really don't need to do that, but it is nice to illustrate the property.
Added 4/7/04: Using calculus, you can figure this precisely using partial derivatives. The value of the single
for instance is equal to the partial derivative of the BsR function with respect to singles. You can still do this even
if you don't know calculus, because the math works out simple with BsR. The formula winds up being:
((B+C)*(A*b+B*a)-(A*B)*(b+c))/((B+C)^2)+d
Let A, B, C, and D be the respective total factors for the entity you are interested in. Let a, b, c, and d be
the A, B, C, and D coefficients of the event you are interested in. That's it. Thank goodness all of the formulas
for the pieces of BsR are linear.
There is a spreadsheet linked at the bottom that shows this. It is based on Tango's full BsR which is available
at:
If you don't want to deal with a category, just set the coefficients to 0. You can change the coefficients for
the other events to use any BsR equation you want all with this spreadsheet. Of course you can also change the "#" column,
which is the frequency of the event for the dataset you're using. Enjoy.
Matching LW Values
Based on the formula above to calculate
the Linear Weight value of a certain event using BsR, you can also fix the B coefficients so that they produce desired LWs. For example, on my LW page there is the ERP formula that I use, based on 1951-1998
composite major league data. Suppose I want to force my BsR formula to produce
the same LW as are used in ERP. How do I go about doing this?
Well, first, I have to clearly define
which events are in the A, C, and D factors, and what coefficient they have there. For
my case, I will use S, D, T, HR, W, and O as the only events. S, D, T, and W
each have a coefficient of 1 in A; O has a coefficient of 1 in C; and HR has a coefficient of 1 in D.
Now, I we need to calculate the A, C,
and D factors for the entity I am working with(in my case, all teams 1951-1998). Then,
I use these to calculate what we will call B--the actual B value required for BsR to equal runs scored.
The formula for ActB is (R-D)*C/(A-R+D), where R is the actual runs scored we want to match.
So, now we have everything we need. a, b, c, and d are still the coefficient for the given event in the respective factors. And we can calculate b as:
B = ((B+C)^2*(L-d)-B^2*a-B*C*a+A*B*c)/(A*C)
Voila.
So, let's look at my ERP equation. It is (TB+W+.5H-.3(AB-H))*.324, which
as LW for S, D, T, HR, W, O is .486, .81, 1.134, 1.458, .324, -.0972. The B that
I use for BsR((2TB-H-4HR+.05W)*.78) is:
B = .78S+2.34D+3.9T+2.34HR+.039W
Now, with all of this data, we can force
the LW values. When we do this(which you can do with the spreadsheet linked at
the bottom of the page, the same one that gives the actual LW values), it seems to give a result that's decent to .001 or
so. It might be rounding error, or it might be something else, but either way,
it's pretty close. So, to match the linear weight values I wanted, my B would
be be:
B = .833S+2.360D+3.888T+2.159HR+.0692W-.010(O)
Yes, the outs have to be included as
well. That's kind of cumbersome if you don't want outs in B, but it's necessary
to force the values. Are you sufficient confused yet? I am.
1979 Pirates LBsR
Linear Weights from Base Runs
|