W% Estimators

While there are not quite as many winning percentage estimators as there are run estimators, there is no shortage of them. Maybe one of the reasons is that the primary method for determining W%, Bill James' Pythagorean method, is a fairly good method that does not have the obvious flaws of Runs Created--although it does have some. The inadequacy of Runs Created has always fueled innovation in the run estimation field.

BenV-L from the FanHome board has provided a classification system for win estimators, which is a little complex but does indeed make sense. He is a genuine math/stats guy, so I won't tread on his territory. I will propose a different classification system that approaches it from a slightly different angle.

First, we have the general area of methods that do not vary based on the run context. Under this, we have linear and non-linear methods. So, first we will look at static linear methods.

The static linear methods all are based in some way on runs minus runs allowed. Most of them take this general form:

W% = RD:G*S + .5

Where RD:G is Run Differential Per Game and S is slope. This is in the form of a basic linear regression, mx + b. Another way to write this, also very common, is:

W% = RD:G/RPW + .5

Where RPW is Runs Per Win, which of course is just the reciprocal of the slope. Of course, you could call the slope Wins Per Run, but I prefer sticking with the regression lingo.

It turns out that for average major league play, the slope turns out to be about .1, or an RPW of 10. For instance, I often use a value of .107 which is based on a regression on 1970s data(more out of habit than anything). However, using regression you can generate a formula that does not weight R and RA equally. One of these methods was published by Arnold Soolman based on 1901-1970 data. He had W% = (.102*R-.103*RA)/G + .505. This equation appears to be based on multiple regression. While it is not inevitable that R and RA be given equal weight, and that a team that scores as many runs as it allows is predicted at .500, it seems like an inevitable choice to me.

Looking at Soolman's formula, a team that scores and allows 4 runs per game is predicted to play .501 baseball. This doesn't seem like a big deal, but let's consider the case of a league that has an average of 4 runs per game. The league would be predicted to play .501 baseball, which is obviously impossible. They would have to play .500 baseball. That is my logic for R=RA=.500 W%, and whether it is good enough is up to you.

We also have non-linear methods that use constants. Earnshaw Cook was the first to actually publish a W% estimator, and it is in this category:

W% = R*.484/RA

A team with equal R and RA would be .484, but if you use .5 instead, it will work.

Another example is the work of Bill Kross:

if R<RA, W% = R/(2*RA)

if R>RA, W% = 1 - RA/2*R

Another is a method that Bill James speculated would work, but never actually used, is "double the edge". This is as follows:

W% = (R/RA*2-1)/(R/RA*2)

The problem with many of these methods is that they obviously break down at the extremes. Using a slope of .1 with the linear method causes a W% of 1 at a RD:G of 5. But a team that scores 5.1 runs per game more than it's opponent will not play 1.01 baseball. Cook's formula produces a W% over 1 with a run ratio over 2.07, although it doesn't allow a sub-zero W%, and isn't accurate at all. The Kross formula simply does not provide a very accurate estimation, at least in comparison to other methods, although it does bound W% between 0 and 1. Double the Edge does not allow a W% above 1, but if the team's run ratio is under .5, it will produce a sub-zero winning percentage.

So every method either is inaccurate or produces impossible answers. While all of these formula will work decently with normal teams in normal scoring contexts, we need methods that work outside of the normal range. There are real .700 teams, and there are teams that play in a context where the two teams average 13 runs a game. And if we want to apply these methods to individuals at all, we definitely need a more versatile method.

Enter the Pythagorean Theorem. Bill James' formula, W% = R^2/(R^2 + RA^2), has a high degree of accuracy and fits the constraints of 0 and 1. These attributes and its relative simplicity has made it the standard for many years. James would later proclaim that 1.83 was a more optimal exponent. The formula by which he came to this conclusion was exponent = 2-1/(RPG-3). At the normal RPG of 9, this does produce an exponent of 1.83, but it provides a maximum possible exponent of 2.333 at 0 RPG and a minimum possible exponent of 2 at infinity RPG, which as we shall see later is a woefully inadequate and illogical formula.

An off-the-wall sort of formula developed by the author is based on an article in the old STATS Pro Football Revealed work, which estimate the Z-score of winning percentage for a team and then converted it back into a W%. I applied this idea to baseball. It is automatically bounded by 0 and 1. Anyway, I estimated Z-score as 2.324*(R-RA)/(R+RA), and then you can use the normal cumulative function to convert it back into a W%.

Now we go into methods that vary somewhat based on the scoring context. This is normally done in terms of Runs Per Game, (R+RA)/G. First, I should just point out that it might be possible to modify the Z-score W% and the Double the Edge method to somehow account for changing RPG, but no one has done so and since the methods aren't optimal, it would probably be a waste of time.

The linear methods that do this simply use a formula based on RPG to estimate RPW or slope before estimating W%. These linear formulas are still subject to the same caveats as the static linear methods--they are not bounded by 0 and 1. But they do add more flexibility, especially within the normal scoring ranges. There are a number of these methods, all of which produce very similar results as BenV-L found. The most famous of these is one developed by Pete Palmer, RPW = 10*sqrt(RPG/9). Some others include David Smyth's (R-RA)/(R+RA) + .5 = W%, which just assumes that RPW = RPG. Ben V-L published the same formula except multiplying (R-RA)/(R+RA) by .91, making RPW = 1.099*RPG. Just for example, another one is Tango Tiger's simple RPG/2 + 5. Again, the accuracy is improved more by using any reasonable modified slope than by finding the optimum one from out of these choices.

Of course, as we said the problems inherent in linear methods are not resolved just by using a flexible slope. The Pythagorean model provides the bounds at 0 and 1, and is what we want to build upon. This will take the form of R^X/(R^X + RA^X).

There have been several published attempts to base X on RPG. One very simple one is RPG/4.6 from David Sadowski. The most famous is Clay Davenport's "Pythagenport", X = 1.5log(RPG) + .45. Davenport used some extreme data and modelling to find his optimal exponent, which claims to be accurate for RPG ranging between 4 and 40.

What about RPG under 4 though? Enter David Smyth. The inventor of Base Runs, the "natural" RC function, came up with a brilliant discovery, revelation, or what have you that allows for the finding of a better exponent. Although it is a remarkable obvious conclusion, once you have been exposed to it, no one outside of Mr. Smyth was able to think it up themselves.

The concept is very simple. The minimum RPG possible in a game is 1, because if neither team scores, the game continues to go. And if a team played 162 games at 1 RPG, they would win each game they scored a run and lose each time they allowed a run. Therefore, to make W/(W+L) = R^X/(R^X + RA^X), X must be set equal to 1. This is a known point in the domain of the exponent: (1,1). Sadowski's formula would give an exponent of .22 at 1 RPG, causing a team that should go 100-62(.617) to be predicted at .526. Davenport comes up with .45, which would project a .554 W% for the team--closer, but still incorrect, and our formula has to work at the only point that we know to be true.

So the search was on for an exponent that would 1) produce 1 at 1 RPG 2)maintain accuracy for real major league teams and 3) be accurate at high RPG. If criteria 1 and 2 were met, but 3 was not, than the Davenport method would be preferable at some times, and the new method would be preferable at others. We want a method that can give us a reasonable estimate all of the time.

It turns out that this author, while fooling around with various regression models fed by the known point and Davenport's exponent at other points, found that RPG^.29 matched Davenport's method in the range where a match was desired. Although I posted it on FanHome, nobody really noticed. A few months later, David Smyth posted RPG^.287, saying that he thought it was an exponent that would fit all of our needs. Bingo. Tango Tiger ran some tests which are linked below and found that RPG^.28 might be the best, but the Patriot/Smyth exponent is the one that, at least to this time, has been shown to produce the optimal results. Some people have taken to calling this Pythagenpat, a takeoff on Pythagenport, but it should always be remembered that Smyth recognized the usefulness of this method to a greater extent than I did and that without his (1,1) discovery, I would have never been attempting to develop an exponent.

Let's just close by illustrating the differences between the various methods for a team that is fairly extreme--they outscore their opponents by a 2:1 ratio in a 5 RPG context(3.33 r/g, 1.67 ra/g):

Model EW%

Cook .968

Kross .750

10 RPW .666

Pyth(X=2) .800

Palmer .723

Sadowski .680

Davenport .739

Patriot/Smyth .751

Although all of these methods with the glaring exception of Cook give a similar standard error when applied to normal major league teams, the differences are quite large when extreme teams are involved. And while a method like Kross might track the Pythagenpat well in this case, there are other cases where it will not. The same goes for all of the methods, although Pythagenport and Pythagenpat are basically equivalent from around 5 to 30 RPG as you can see in the chart linked on this page.

Although linear models do not have the best theoretical accuracy, there are certain situations in which they can come in handy. What I did was use the Pythagenpat method as the basis for a slope formula. We can calculate the slope that is in effect for a team at any given point based on the Pythagorean method by knowing the exponent x(which I figured by Pythagenpat), the Run Ratio, and the RPG. The formula for this, originally published by Smyth but in a different form, is S = (RR^x/(RR^x+1)-.5)/(RPG*(2*RR/(RR+1)-1)) What I did was calculate the needed slope for a team with RR 1.01, 1.05, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, and 2 at each 1 RPG interval from 1-14. I then attempted to regress for a formula for slope based on RPG. I eventually decided to cut out the teams from 1-4 RPG because they simply were too different to fit into the model. But using the teams at 5-14, I came up with an equation that works fairly well in that range, S = .279-.182*log(RPG). You can see in another set of charts linked below the needed slope at 1-14 RPG and a chart showing the actual needed slope(marked

Series 3) and the predicted slope(Series 2). The fit is pretty good in that range, but caution should be used if you try to take it outside of the tested region. Applied to actual 1961-2000 teams projected to 162 games, it has a RMSE of 4.015, comparable to the most accurate methods.

Finally, I think it would be useful to observe that in all of these methods, four basic components pop up a lot: Runs Per Game(RPG), Run Ratio(RR), Run Percentage(R%), and Run Differential Per Game(RD:G). I have provided the formulas for each of these and formulas that you can use to convert between themselves, something that is technical math crap instead of real sabermetric knowledge, but I find the conversion formulas useful:

RPG = (R+RA)/G

RR = R/RA

R% = R/(R+RA)

RD:G = (R-RA)/G

RR = R%/(1-R%)

RR = (RD:G/(2*RPG)+.5)/(.5-RD:G/(2*RPG))

R% = RR/(RR+1)

R% = (RD:G+RPG)/(2*RPG) = RD:G/(2*RPG) + .5

RD:G = RPG*(2*R%-1)

RD:G = RPG*(2*RR/(RR+1)-1)

ADDED 12/2004: Pythagorean RPW and Slope Based on Calculus

NOTE: Some(most? All?) of the insights and formulas in this section have previously been published and discussed in some way, shape, or form by Ben Vollmayr-Lee on Primer or Fanhome and by Ralph Caola in the November 2003, February 2004 and May 2004 "By the Numbers". However, for the sake of my sanity, I am going to write this only acknowledging their work when I need it to further my discussion. But I just want it to be clear that I am not the first to publish this sort of stuff and while I did have the idea to do some of this before I saw Caola's piece, he published and almost certainly actually did it first. No slight to him or Ben intended at all. Also, while other places on this page I have used RD:G to mean run differential per game, I will now use RD to mean RD:G as well to make these long formulas less difficult to read. Now…

The derivative of Run Differential with respect to Pythagorean Winning Percentage should be the marginal number of runs per win. By marginal, I mean the number of runs necessary to add one more win. The marginal RPW given by the derivative process will not be the actual RPW that the team purchased wins at. That can be found through Pyth, in this formula from David Smyth(I have given its equivalent formula above, which is for slope):

RPW = 2*(R-RA)*(R^x + RA^x)/(R^x - RA^x)

Where everything is in terms of per game. Although I gave an equivalent to that above, I don't think I explained how to derive it, so I will now. Start with:

W% = RD/RPW + .5

Rearrange to get RPW = RD/(W% - .5)

Now substitute Pythagorean W% for W% to get:

RPW = (R-RA)/(R^x/(R^x + RA^x) - .5)

I haven't done the algebra, but trust me, that equation is equivalent to the Smyth equation above. As you can see from Smyth's version, when R = RA, the fraction and therefore RPW will be undefined. It turns out that at that point, the marginal RPW as derived below will fill in and make the function continuous.

To go about getting dW%/dRD, we can first write:

W% = RR^x/(RR^x + 1)

and the identity that:

RR = (RD/(2*RPG) + .5)/(.5 - RD/(2*RPG)).

We differentiate that with respect to RD to get:

dRR/dRD = ((.5-RD/(2*RPG))*(1/(2RPG))-(RD/(2*RPG) + .5)*(-1/(2*RPG)))/(.5 - RD/(2*RPG))^2.

That simplifies to:

dRR/dRD = 1/(2*RPG*(.5-RD/(2*RPG))^2)

We can then take the derivative of W% with respect to RR:

dW%/dRR = ((RR^x + 1)*(x*RR^(x-1))-RR^x*(x*RR^(x-1)))/(RR^x + 1)^2

We can then use:

dW%/dRR * dRR/dRD = dW%/dRD

Which is slope. Doing the math, we have:

S = x*RR^(x-1)/(2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)

The reciprocal is RPW:

RPW = ((2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)/(x*RR^(x-1))

Caola worked out RPW for when x = 2 and got:

RPW = (RPG^2 + RD^2)^2/(RPG*(RPG^2 - RD^2))

Which is equivalent to mine when x = 2.

This is all potentially useful, but we don't really care as much about the RPW for specific teams as we do about RPW for given levels of RPG. What we can do is run these formulas for an average team at a given RPG; one with a RR of one and a RD of zero(R = RA). If you do this with Caola's formula at exponent 2, you get:

RPW = (RPG^2 + 0^2)^2/(RPG*(RPG^2 - 0^2)) = RPG^4/RPG^3 = RPG

So at x = 2, an average team will have a RPW equal to their RPG, which is a very common sense approximation that people have used for RPW. What about exponents other then two though? Using my formula:

RPW = 2*RPG*(1^x + 1)^2*(.5 - 0/(2*RPG))^2/(x*1^(x-1)) = 2*RPG/X

Or slope = X/(2*RPG)

I believe this has been published by David Glass on rsbb almost a decade ago. Anyway, if you use a pyth exponent of 1.82, you will get a slope of 1.82/(2*RPG) = .91*RPG...the Ben V-L .91*(R-RA)/(R+RA) formula for winning percentage. Since 1.82 is a very good historical estimate for Pythagorean exponent, it is then no surprise that the .91 multiplier gives one of the very best simple linear fits for W%.

Above, I mentioned Bill James' "Double the Edge" method. This method is not a particularly accurate one, but it has some cool properties and I think it is worth spending a little bit of time on. First, let's define Win Ratio(WR = W/L) and of course Run Ratio(RR = R/RA). DTE states that the relationship between them is:
WR = 2*RR-1
Pythagorean states that:
WR = RR^x or in the most common form, WR = RR^2
Then, in DTE or Pyth: W% = WR/(WR + 1) and that WR = W%/(1-W%).
DTE states the relationship between RR and WR in a linear regression fasion, y = mx + b, where y is WR, m is the slope, x is the RR, and b is the intercept. So one way to find m and b is to do a linear regression.

But there is another way as well, and that is to do some calculus based on Pythagorean. In calculus, you can find the tangent line to a function at a given point. This line has the slope of the derivative of the function at that point, and intersects the graph of the function. For a better explanation, find a calc professor or something, because that's as best as I can describe it. Anyway, in the relationship between WR and RR, there is one point, a "known point" at which we know the exact relationship between them. That point is that when a team scores the same number of runs it allows(RR = 1), it will be a .500 team(WR = 1). What is the tangent line to Pythagorean at this point?

We can write the equation of the tangent line as:
y - y1 = m*(x - x1)
Where y is WR, y1 is the given WR(1), m is the slope or derivative of the Pythagorean equation at that point, x is the RR, and x1 is the given RR(1). The derivative of WR = RR^x with respect to RR is dWR/dRR = x*RR. So at RR = 1, the derivative is simply x--the pythagorean exponent. We'll use x = 2 for Pyth, and substitute everything in:
WR - 1 = 2*(RR - 1)
This simplifies to:
WR = 2*RR - 1
Double the Edge. So Double the Edge is actually the tangent line of the Pythagorean WR equation at the point where RR = WR = 1. The general form of the equation is:
WR - WR1 = (x*RR)*(RR - RR1)
This tangent approximation could be used at any RR/WR combo--except we do not intrinsically know what the WR should be for a 1.5 RR team for instance. That's why we need Run-to-Win converters like Pyth in the first place. So we cannot use the general form, and the one that we can use for all team is the one based on WR = RR = 1.

There is one other known point, WR = RR = 0. However, if you base a DTE equation on this, it is only close to Pyth at EXTREMELY low Run Ratios, and will give RR = 1 team WR > 1.

The basic DTE equation caps W% at 1, but any team with a RR of .5 will have a 0 W%, so it doesn't have the upper AND lower bounds which make Pyth unique among most run estimators. As it turns out(and not surprisingly if you've ever looked at the graph of a parabola versus the graph of a straight line), DTE does not "kick in" fast enough at high RRs. This can be seen through the calculus to find the RPW based on DTE. dW%/dRR from DTE is :
dW%/dRR = (x*(x*RR+2-x)-x*(x*RR-x+1))/(x*RR+2-x)^2
dRR/dRD = 1/((2*RPG*(.5-RD/(2*RPG))^2)
So dW%/dRR*dRR/dRD = dW%/dRD = slope = 1/RPW:
dW%/dRD = x/((2*RPG*(.5-RD/(2*RPG))^2*(x*RR+2-x)^2)
If you try that for an average team(RD = 0, RR = 1)
slope = x/(2*RPG*1/4*4) = x/(2*RPG)
Which is the same result you get from Pythagorean, unsurprising since it matches the Pyth value at the known point. Anyway, though, if you put in a team with 10 RPG and 1 RD(1.222 RR) you get a Pyth RPW of 10.30 and a DTE RPW of 12.10. It is setting the "win purchase price" way too high and that's why it doesn't work well for high RR teams. But for ordinary teams, DTE is about as accurate as anything else.

Just as 2 is not necessarily the most accurate exponent for Pyth, 2 is not necessarily the most accurate slope for DTE. You can use regression or various Pyth exponent formulas to find the best exponent/slope(they are the same, at least at the known point). If you do this:
WR = x*RR - (x - 1)
So W% = (x*RR - (x - 1))/(x*RR - (x - 1) + 1) = (x*RR-(x-1))/(x*RR + (2 - x))

Another thing...this same concept applies to other sabermetric formulas. For example, Davenport had/has two forms for EQR from RAW. One is (RAW/LgRAW)^2 = (R/PA)/(LgR/PA) and the other is 2*RAW/LgRAW - 1 = (R/PA)/(LgR/PA). These equations are related in the same way Pyth and DTE are.

DTE is not an important topic in W% estimators, but the math elements interest me, so you got a rambling essay about it.