While there are not quite as many winning percentage estimators as there are
run estimators, there is no shortage of them. Maybe one of the reasons is that
the primary method for determining W%, Bill James' Pythagorean method, is a fairly good method that does not have the obvious
flaws of Runs Created--although it does have some. The inadequacy of Runs Created
has always fueled innovation in the run estimation field.
BenV-L from the FanHome board has provided a classification system for win
estimators, which is a little complex but does indeed make sense. He is a genuine
math/stats guy, so I won't tread on his territory. I will propose a different
classification system that approaches it from a slightly different angle.
First, we have the general area of methods that do not vary based on the run
context. Under this, we have linear and non-linear methods. So, first we will look at static linear methods.
The static linear methods all are based in some way on runs minus runs allowed. Most of them take this general form:
W% = RD:G*S + .5
Where RD:G is Run Differential Per Game and S is slope. This is in the form of a basic linear regression, mx + b. Another
way to write this, also very common, is:
W% = RD:G/RPW + .5
Where RPW is Runs Per Win, which of course is just the reciprocal of the slope. Of course, you could call the slope Wins Per Run, but I prefer sticking with the regression
lingo.
It turns out that for average major league play, the slope turns out to be
about .1, or an RPW of 10. For instance, I often use a value of .107 which is
based on a regression on 1970s data(more out of habit than anything). However,
using regression you can generate a formula that does not weight R and RA equally. One
of these methods was published by Arnold Soolman based on 1901-1970 data. He
had W% = (.102*R-.103*RA)/G + .505. This equation appears to be based on multiple
regression. While it is not inevitable that R and RA be given equal weight, and
that a team that scores as many runs as it allows is predicted at .500, it seems like an inevitable choice to me.
Looking at Soolman's formula, a team that scores and allows 4 runs per game
is predicted to play .501 baseball. This doesn't seem like a big deal, but let's
consider the case of a league that has an average of 4 runs per game. The league
would be predicted to play .501 baseball, which is obviously impossible. They
would have to play .500 baseball. That is my logic for R=RA=.500 W%, and whether
it is good enough is up to you.
We also have non-linear methods that use constants. Earnshaw Cook was the first to actually publish a W% estimator, and it is in this category:
W% = R*.484/RA
A team with equal R and RA would be .484, but if you use .5 instead, it will
work.
Another example is the work of Bill Kross:
if R<RA, W% = R/(2*RA)
if R>RA, W% = 1 - RA/2*R
Another is a method that Bill James speculated would work, but never actually
used, is "double the edge". This is as follows:
W% = (R/RA*2-1)/(R/RA*2)
The problem with many of these methods is that they obviously break down at
the extremes. Using a slope of .1 with the linear method causes a W% of 1 at
a RD:G of 5. But a team that scores 5.1 runs per game more than it's opponent
will not play 1.01 baseball. Cook's formula produces a W% over 1 with a run ratio
over 2.07, although it doesn't allow a sub-zero W%, and isn't accurate at all. The
Kross formula simply does not provide a very accurate estimation, at least in comparison to other methods, although it does
bound W% between 0 and 1. Double the Edge does not allow a W% above 1, but if
the team's run ratio is under .5, it will produce a sub-zero winning percentage.
So every method either is inaccurate or produces impossible answers. While all of these formula will work decently with normal teams in normal scoring contexts, we need methods
that work outside of the normal range. There are real .700 teams, and there are
teams that play in a context where the two teams average 13 runs a game. And
if we want to apply these methods to individuals at all, we definitely need a more versatile method.
Enter the Pythagorean Theorem. Bill
James' formula, W% = R^2/(R^2 + RA^2), has a high degree of accuracy and fits the constraints of 0 and 1. These attributes and its relative simplicity has made it the standard for many years. James would later proclaim that 1.83 was a more optimal exponent.
The formula by which he came to this conclusion was exponent = 2-1/(RPG-3). At
the normal RPG of 9, this does produce an exponent of 1.83, but it provides a maximum possible exponent of 2.333 at 0 RPG
and a minimum possible exponent of 2 at infinity RPG, which as we shall see later is a woefully inadequate and illogical formula.
An off-the-wall sort of formula developed by the author is based on an article
in the old STATS Pro Football Revealed work, which estimate the Z-score of winning percentage for a team and then converted
it back into a W%. I applied this idea to baseball. It is automatically bounded by 0 and 1. Anyway, I estimated
Z-score as 2.324*(R-RA)/(R+RA), and then you can use the normal cumulative function to convert it back into a W%.
Now we go into methods that vary somewhat based on the scoring context. This is normally done in terms of Runs Per Game, (R+RA)/G. First, I should just point out that it might be possible to modify the Z-score W% and the Double the Edge
method to somehow account for changing RPG, but no one has done so and since the methods aren't optimal, it would probably
be a waste of time.
The linear methods that do this simply use a formula based on RPG to estimate
RPW or slope before estimating W%. These linear formulas are still subject to
the same caveats as the static linear methods--they are not bounded by 0 and 1. But
they do add more flexibility, especially within the normal scoring ranges. There
are a number of these methods, all of which produce very similar results as BenV-L found.
The most famous of these is one developed by Pete Palmer, RPW = 10*sqrt(RPG/9).
Some others include David Smyth's (R-RA)/(R+RA) + .5 = W%, which just assumes that RPW = RPG. Ben V-L published the same formula except multiplying (R-RA)/(R+RA) by .91, making RPW = 1.099*RPG. Just for example, another one is Tango Tiger's simple RPG/2 + 5. Again, the accuracy is improved more by using any reasonable modified slope than by finding the optimum
one from out of these choices.
Of course, as we said the problems inherent in linear methods are not resolved
just by using a flexible slope. The Pythagorean model provides the bounds at
0 and 1, and is what we want to build upon. This will take the form of R^X/(R^X
+ RA^X).
There have been several published attempts to base X on RPG. One very simple one is RPG/4.6 from David Sadowski. The most
famous is Clay Davenport's "Pythagenport", X = 1.5log(RPG) + .45. Davenport used
some extreme data and modelling to find his optimal exponent, which claims to be accurate for RPG ranging between 4 and 40.
What about RPG under 4 though? Enter
David Smyth. The inventor of Base Runs, the "natural" RC function, came up with
a brilliant discovery, revelation, or what have you that allows for the finding of a better exponent. Although it is a remarkable obvious conclusion, once you have been exposed to it, no one outside of Mr.
Smyth was able to think it up themselves.
The concept is very simple. The
minimum RPG possible in a game is 1, because if neither team scores, the game continues to go.
And if a team played 162 games at 1 RPG, they would win each game they scored a run and lose each time they allowed
a run. Therefore, to make W/(W+L) = R^X/(R^X + RA^X), X must be set equal to
1. This is a known point in the domain of the exponent: (1,1). Sadowski's formula would give an exponent of .22 at 1 RPG, causing a team that should go 100-62(.617) to
be predicted at .526. Davenport comes
up with .45, which would project a .554 W% for the team--closer, but still incorrect, and our formula has to work at the only
point that we know to be true.
So the search was on for an exponent that would 1) produce 1 at 1 RPG 2)maintain
accuracy for real major league teams and 3) be accurate at high RPG. If criteria
1 and 2 were met, but 3 was not, than the Davenport method would be preferable
at some times, and the new method would be preferable at others. We want a method
that can give us a reasonable estimate all of the time.
It turns out that this author, while fooling around with various regression
models fed by the known point and Davenport's exponent at other points, found
that RPG^.29 matched Davenport's method in the range where a match was desired. Although I posted it on FanHome, nobody really noticed. A few months later, David Smyth posted RPG^.287, saying that he thought it was an exponent that would fit
all of our needs. Bingo. Tango Tiger
ran some tests which are linked below and found that RPG^.28 might be the best, but the Patriot/Smyth exponent is the one
that, at least to this time, has been shown to produce the optimal results. Some
people have taken to calling this Pythagenpat, a takeoff on Pythagenport, but it should always be remembered that Smyth recognized
the usefulness of this method to a greater extent than I did and that without his (1,1) discovery, I would have never been
attempting to develop an exponent.
Let's just close by illustrating the differences between the various methods
for a team that is fairly extreme--they outscore their opponents by a 2:1 ratio in a 5 RPG context(3.33 r/g, 1.67 ra/g):
Model
EW%
Cook
.968
Kross
.750
10 RPW
.666
Pyth(X=2)
.800
Palmer
.723
Sadowski
.680
Davenport
.739
Patriot/Smyth
.751
Although all of these methods with the glaring exception of Cook give a similar
standard error when applied to normal major league teams, the differences are quite large when extreme teams are involved. And while a method like Kross might track the Pythagenpat well in this case, there
are other cases where it will not. The same goes for all of the methods, although
Pythagenport and Pythagenpat are basically equivalent from around 5 to 30 RPG as you can see in the chart linked on this page.
Although linear models do not have the best theoretical accuracy, there are
certain situations in which they can come in handy. What I did was use the Pythagenpat
method as the basis for a slope formula. We can calculate the slope that is in
effect for a team at any given point based on the Pythagorean method by knowing the exponent x(which I figured by Pythagenpat),
the Run Ratio, and the RPG. The formula for this, originally published by Smyth
but in a different form, is S = (RR^x/(RR^x+1)-.5)/(RPG*(2*RR/(RR+1)-1)) What
I did was calculate the needed slope for a team with RR 1.01, 1.05, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, and 2 at
each 1 RPG interval from 1-14. I then attempted to regress for a formula for
slope based on RPG. I eventually decided to cut out the teams from 1-4 RPG because
they simply were too different to fit into the model. But using the teams at
5-14, I came up with an equation that works fairly well in that range, S = .279-.182*log(RPG).
You can see in another set of charts linked below the needed slope at 1-14 RPG and a chart showing the actual needed
slope(marked
Series 3) and the predicted slope(Series 2).
The fit is pretty good in that range, but caution should be used if you try to take it outside of the tested region. Applied to actual 1961-2000 teams projected to 162 games, it has a RMSE of 4.015,
comparable to the most accurate methods.
Finally, I think it would be useful to observe that in all of these methods,
four basic components pop up a lot: Runs Per Game(RPG), Run Ratio(RR), Run Percentage(R%), and Run Differential Per Game(RD:G). I have provided the formulas for each of these and formulas that you can use to convert
between themselves, something that is technical math crap instead of real sabermetric knowledge, but I find the conversion
formulas useful:
RPG = (R+RA)/G
RR = R/RA
R% = R/(R+RA)
RD:G = (R-RA)/G
RR = R%/(1-R%)
RR = (RD:G/(2*RPG)+.5)/(.5-RD:G/(2*RPG))
R% = RR/(RR+1)
R% = (RD:G+RPG)/(2*RPG) = RD:G/(2*RPG) + .5
RD:G = RPG*(2*R%-1)
RD:G = RPG*(2*RR/(RR+1)-1)
This chart compares the values of the Davenport exponent, 1.5log(RPG)+.45 against that of David Smyth and myself(RPG^.29)
at each .1 interval of RPG from 1 to 20.
Davenport v. Smyth/Patriot Exponents
ADDED 12/2004: Pythagorean RPW and Slope Based on Calculus
NOTE: Some(most? All?) of the insights and formulas in this section have previously been published and discussed in some
way, shape, or form by Ben Vollmayr-Lee on Primer or Fanhome and by Ralph Caola in the November 2003, February 2004 and May
2004 "By the Numbers". However, for the sake of my sanity, I am going to write this only acknowledging their work when I
need it to further my discussion. But I just want it to be clear that I am not the first to publish this sort of stuff and
while I did have the idea to do some of this before I saw Caola's piece, he published and almost certainly actually did it
first. No slight to him or Ben intended at all. Also, while other places on this page I have used RD:G to mean run differential
per game, I will now use RD to mean RD:G as well to make these long formulas less difficult to read. Now…
The derivative of Run Differential with respect to Pythagorean Winning Percentage should be the marginal number of runs per
win. By marginal, I mean the number of runs necessary to add one more win. The marginal RPW given by the derivative process
will not be the actual RPW that the team purchased wins at. That can be found through Pyth, in this formula from David Smyth(I
have given its equivalent formula above, which is for slope):
RPW = 2*(R-RA)*(R^x + RA^x)/(R^x - RA^x)
Where everything is in terms of per game. Although I gave an equivalent to that above, I don't think I explained how to derive
it, so I will now. Start with:
W% = RD/RPW + .5
Rearrange to get RPW = RD/(W% - .5)
Now substitute Pythagorean W% for W% to get:
RPW = (R-RA)/(R^x/(R^x + RA^x) - .5)
I haven't done the algebra, but trust me, that equation is equivalent to the Smyth equation above. As you can see from Smyth's
version, when R = RA, the fraction and therefore RPW will be undefined. It turns out that at that point, the marginal RPW
as derived below will fill in and make the function continuous.
To go about getting dW%/dRD, we can first write:
W% = RR^x/(RR^x + 1)
and the identity that:
RR = (RD/(2*RPG) + .5)/(.5 - RD/(2*RPG)).
We differentiate that with respect to RD to get:
dRR/dRD = ((.5-RD/(2*RPG))*(1/(2RPG))-(RD/(2*RPG) + .5)*(-1/(2*RPG)))/(.5 - RD/(2*RPG))^2.
That simplifies to:
dRR/dRD = 1/(2*RPG*(.5-RD/(2*RPG))^2)
We can then take the derivative of W% with respect to RR:
dW%/dRR = ((RR^x + 1)*(x*RR^(x-1))-RR^x*(x*RR^(x-1)))/(RR^x + 1)^2
We can then use:
dW%/dRR * dRR/dRD = dW%/dRD
Which is slope. Doing the math, we have:
S = x*RR^(x-1)/(2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)
The reciprocal is RPW:
RPW = ((2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)/(x*RR^(x-1))
Caola worked out RPW for when x = 2 and got:
RPW = (RPG^2 + RD^2)^2/(RPG*(RPG^2 - RD^2))
Which is equivalent to mine when x = 2.
This is all potentially useful, but we don't really care as much about the RPW for specific teams as we do about RPW for given
levels of RPG. What we can do is run these formulas for an average team at a given RPG; one with a RR of one and a RD of
zero(R = RA). If you do this with Caola's formula at exponent 2, you get:
RPW = (RPG^2 + 0^2)^2/(RPG*(RPG^2 - 0^2)) = RPG^4/RPG^3 = RPG
So at x = 2, an average team will have a RPW equal to their RPG, which is a very common sense approximation that people have
used for RPW. What about exponents other then two though? Using my formula:
RPW = 2*RPG*(1^x + 1)^2*(.5 - 0/(2*RPG))^2/(x*1^(x-1)) = 2*RPG/X
Or slope = X/(2*RPG)
I believe this has been published by David Glass on rsbb almost a decade ago. Anyway, if you use a pyth exponent of 1.82,
you will get a slope of 1.82/(2*RPG) = .91*RPG...the Ben V-L .91*(R-RA)/(R+RA) formula for winning percentage. Since 1.82
is a very good historical estimate for Pythagorean exponent, it is then no surprise that the .91 multiplier gives one of the
very best simple linear fits for W%.
Above, I mentioned Bill James' "Double the Edge" method. This method is not a particularly accurate one, but it
has some cool properties and I think it is worth spending a little bit of time on. First, let's define Win Ratio(WR = W/L)
and of course Run Ratio(RR = R/RA). DTE states that the relationship between them is:
WR = 2*RR-1
Pythagorean states that:
WR = RR^x or in the most common form, WR = RR^2
Then, in DTE or Pyth: W% = WR/(WR + 1) and that WR = W%/(1-W%).
DTE states the relationship between RR and WR in a linear regression fasion, y = mx + b, where y is WR, m is the slope,
x is the RR, and b is the intercept. So one way to find m and b is to do a linear regression.
But there is another way as well, and that is to do some calculus based on Pythagorean. In calculus, you can find the
tangent line to a function at a given point. This line has the slope of the derivative of the function at that point, and
intersects the graph of the function. For a better explanation, find a calc professor or something, because that's as best
as I can describe it. Anyway, in the relationship between WR and RR, there is one point, a "known point" at which
we know the exact relationship between them. That point is that when a team scores the same number of runs it allows(RR =
1), it will be a .500 team(WR = 1). What is the tangent line to Pythagorean at this point?
We can write the equation of the tangent line as:
y - y1 = m*(x - x1)
Where y is WR, y1 is the given WR(1), m is the slope or derivative of the Pythagorean equation at that point, x is the
RR, and x1 is the given RR(1). The derivative of WR = RR^x with respect to RR is dWR/dRR = x*RR. So at RR = 1, the derivative
is simply x--the pythagorean exponent. We'll use x = 2 for Pyth, and substitute everything in:
WR - 1 = 2*(RR - 1)
This simplifies to:
WR = 2*RR - 1
Double the Edge. So Double the Edge is actually the tangent line of the Pythagorean WR equation at the point where RR
= WR = 1. The general form of the equation is:
WR - WR1 = (x*RR)*(RR - RR1)
This tangent approximation could be used at any RR/WR combo--except we do not intrinsically know what the WR should be
for a 1.5 RR team for instance. That's why we need Run-to-Win converters like Pyth in the first place. So we cannot use
the general form, and the one that we can use for all team is the one based on WR = RR = 1.
There is one other known point, WR = RR = 0. However, if you base a DTE equation on this, it is only close to Pyth at
EXTREMELY low Run Ratios, and will give RR = 1 team WR > 1.
The basic DTE equation caps W% at 1, but any team with a RR of .5 will have a 0 W%, so it doesn't have the upper AND lower
bounds which make Pyth unique among most run estimators. As it turns out(and not surprisingly if you've ever looked at the
graph of a parabola versus the graph of a straight line), DTE does not "kick in" fast enough at high RRs. This
can be seen through the calculus to find the RPW based on DTE. dW%/dRR from DTE is :
dW%/dRR = (x*(x*RR+2-x)-x*(x*RR-x+1))/(x*RR+2-x)^2
dRR/dRD = 1/((2*RPG*(.5-RD/(2*RPG))^2)
So dW%/dRR*dRR/dRD = dW%/dRD = slope = 1/RPW:
dW%/dRD = x/((2*RPG*(.5-RD/(2*RPG))^2*(x*RR+2-x)^2)
If you try that for an average team(RD = 0, RR = 1)
slope = x/(2*RPG*1/4*4) = x/(2*RPG)
Which is the same result you get from Pythagorean, unsurprising since it matches the Pyth value at the known point. Anyway,
though, if you put in a team with 10 RPG and 1 RD(1.222 RR) you get a Pyth RPW of 10.30 and a DTE RPW of 12.10. It is setting
the "win purchase price" way too high and that's why it doesn't work well for high RR teams. But for ordinary teams,
DTE is about as accurate as anything else.
Just as 2 is not necessarily the most accurate exponent for Pyth, 2 is not necessarily the most accurate slope for DTE.
You can use regression or various Pyth exponent formulas to find the best exponent/slope(they are the same, at least at
the known point). If you do this:
WR = x*RR - (x - 1)
So W% = (x*RR - (x - 1))/(x*RR - (x - 1) + 1) = (x*RR-(x-1))/(x*RR + (2 - x))
Another thing...this same concept applies to other sabermetric formulas. For example, Davenport had/has two forms for
EQR from RAW. One is (RAW/LgRAW)^2 = (R/PA)/(LgR/PA) and the other is 2*RAW/LgRAW - 1 = (R/PA)/(LgR/PA). These equations
are related in the same way Pyth and DTE are.
DTE is not an important topic in W% estimators, but the math elements interest me, so you got a rambling essay about it.
|