New and Superior Apportionment Method

By Warren D. Smith, January 2007

Let Pk be the (known) population of state k, and let Sk be its number of congressmen, where the total number S=∑kSk of congressmen is fixed by US law to be S=435.

The question is: what should the Sk be? The difficulty is that they have to be integers.

The new method we propose is very simple. Choose a number Q>0, traditionally called "the divisor," so that the total number of seats S will come out correct, and let

Sk = ⌈Pk/Q - d⌉;

where d is a constant that had been chosen ahead of time. There are two ways to choose d. First, we can choose it to yield the best retrospective performance (least "bias") over past US history. Second, we can instead choose d based on a purely theoretical analysis, which yields the following formula:

d = (1/K) ln( K/[1-exp(-K)] ),     where    K = #states/#seats.

With the USA's K=50/435=10/87=0.11494252873... we get d=0.495211255149063832... For application to "party list" PR voting systems, instead K=#parties/#seats, and with typical values #parties=20 and #seats=300 we would get d=0.49722. In the limit K→0+ we would get d=1/2 (Webster), and in the opposite limit K→1 we would get d=1-ln(e-1)≈0.458675.

1. Why the new method is superior to all previous ones

First of all, the classic apportionment methods of Webster, Adams, and Jefferson are merely special cases of our new method, with d=½, d=0+, and d=1- respectively. If we choose the optimal value of d, then we must get a method at least as good as these (by whatever performance measure we are optimizing). Because Balinski & Young's book concluded Webster (and in some ways Jefferson) were the best methods among the ones they knew (and computer simulations by Dan Bishop and myself also came to that same conclusion), the new method is at least as good as all previous to the extent you buy Balinski & Young.

Second, because the new method is a "divisor method" it enjoys all the properties that, B&Y proved, divisor methods (and only divisor methods) always enjoy: "House monotonicity," "population-pair monotonicity," and freedom from the "new state paradox."

The Hamilton/Vinton method fails all three kinds of monotonicity. The Balinski-Young method fails two of them but obeys "House monotonicity" and both it and Hamilton/Vinton obey the "quota criterion" that the number of seats a state is awarded differs by at most 1 from its ideal value. [All divisor methods fail the quota criterion in suitable situations; and indeed the quota and population-pair-monotonicity criteria are logically incompatible.]

Third, computer simulations by Dan Bishop's computer program (as well as an upcoming more powerful program of my own) found superior new-record behavior for the new method (using the theoretical model's d) over all previous divisor methods both in terms of several bias measures, and in terms of making "quota violations" get rarer. (Note: this computer finding was not merely a trivial and expected consequence of our derivation below, because the bias measures the computer evaluated were not the same as the one nulled out by our derivation, and also because the computer examined other probabilistic models.)

Fourth, both our computer simulations and US history agree that Webster is slightly biased in favor of the large-population states, i.e. it tends to give them slightly more than their ideal (but noninteger) fair share of congressmen. (And Adams is heavily biased in favor of small states, and Jefferson is heavily biased in favor of large ones.) The new method with d=0.495 perturbs Webster in the right direction to get rid of that bias, or alternatively our theoretical model that outputs d=0.495 may be regarded as having predicted Webster's bias.

2. The theoretical model and derivation

As our starting point, we shall assume the state populations are independent identically distributed exponential random variables, i.e. with density     C exp(-xC) dx     on x>0, for some constant C>0 whose value will not matter or anyhow will be derivable from census data, see below. (We shall discuss later why, among all simplistic models, only exponential densities make sense. Attempts to use, e.g, uniform densities in intervals, will be argued to be senseless.) In that case, the states whose "quotas" (the traditional word for "ideal [albeit perhaps noninteger] seat-count") lie between two given consecutive integers n and n+1, will have probability density for those quotas given by   K' exp(-xK)   for n<x<n+1 for some constants K' and K where K is independent of both x and n. Indeed, K is given by the formula K=#states/#seats, because this and only this formula, causes the expectation value of x to come out right – to 1/K.

That is: the probability density proportional to exp(-xK)dx is postulated to hold for state populations measured in units of one ideal seat's worth of population. In that case, the expectation value of x, which is 1/K, must equal #seats/#states, i.e. the correct average number of seats a state gets.

In other words:

  1. The average population of a state (measured in units of 1 ideal seat) is #seats/#states.
  2. This is set equal to the expected value of the population probability density.
It also is possible to regard our K=#states/#seats formula as a "maximum likelihood statistical estimator." That is, the "likelihood" of the m state populations x1, x2, …, xm is ∏1≤j≤mKexp(-Kxj) so that
LogLikelihood = m·ln(K) - K∑1≤j≤mxj = m·ln(K) - K·m·mean(x1, x2, …, xm),
and maximizing the LogLikelihood by choice of K, we have
m/K = m·meanx
which is the same as our old formula K=1/meanx (where as usual state populations are measured in units of one ideal seat's worth) which we had got in a less-hifalutin way by just matching means.

Now suppose we wish to choose a threshhold y, n<y<n+1, so that, if these states have quota>y, their seat-count is rounded up to n+1, otherwise down to n; and we wish to choose y so that the net expected additive effect of the rounding is zero, i.e. so that the rounding is unbiased. The formula that determines y is then

n≤x≤y (x-n) exp(-x K) dx - ∫y≤x≤n+1 (n+1-x) exp(-x K) dx = 0
which simplifies to
K exp(-Ky) + exp(-Kn-K) = exp(-Kn)
whose solution is
y = n + (1/K) ln( K/[1-exp(-K)] ).

It is very interesting and pleasant that this formula for y-n is independent of n. This new apportionment method is (by its derivation) unbiased in our probabilistic model not only for all states en masse in the sense that neither large nor small states have any systematic advantage, but also, more strongly, is unbiased for each state-subset in each quota-interval [n, n+1] in the sense the expected additive adjustment of the number of seats caused by rounding is zero. In the K→0+ limit it reduces to the Webster formula y=n+½, and in the K→∞ limit it reduces to the Adams formula y=n+0; for intermediate K it provides intermediate formulae.

Our simple apportionment proposal does not provide a 1-seat minimum; it will award small-enough states zero seats. If you want to have a 1-seat minimum, then the first rounding point y (between 0 and 1) needs to be replaced by y=0. The other rounding points y then are still given by the usual formula y=n+d, for n=1,2,3,... but now with a different value of d. Again you can choose d to optimize historical performance, or in the same theoretical model we have the exact formula

d = [ ln(K) - ln(exp(K)-1) + ln(1/(1-K)) ]/K

which when K=50/435 yields d≈0.557504703. [Note this formula only is valid when 0<K<1.] This formula was derived by solving for y to zero

0≤x≤1 (1-x) exp(-xK) dx + ∑1≤n≤∞ [ ∫y≤x≤n+1 (n+1-x) exp(-xK) dX - ∫n≤x≤y (x-n) exp(-xK) dx ].

The resulting apportionment method then is only unbiased for all states en masse (albeit if K>1 then unbias is unachievable) and not for each state-subset in each quota-interval [n, n+1] individually.

3. Alternative (less simple) theoretical models, goals, and results

It is also possible to start from other underlying simplistic probabilistic models besides i.i.d. exponential state-population densities. I shall argue soon that these other models make less sense, but disregarding that, one can proceed anyway and see what formulas result. Also, we can work toward different goals than zeroing the expected additive adjustment to the quota (caused by rounding-to-integer); we instead could, e.g, zero the expected additive adjustment to the log of the quota. We have worked out several examples, some of which will be given here. They all yield rather more complicated formulas, which unfortunately always depend on n:

1. If we aim to null out the expected additive adjustment in a state's log(#seats) by choice of y, then solving

n≤x≤y [ln(x)-ln(n)] exp(-xK) dx - ∫y≤x≤n+1 [ln(n+1)-ln(x)] exp(-xK) dx = 0
y = (1/K) ln( ln(1+1/n) / [Ei1(Kn)-Ei1(Kn+K)] )
which in the K→0+ limit reduces to y→1/ln(1+1/n), [whose asymptotic behavior for large n is y=n+½-1/(12n)+1/(24n²)-19/(720n³) + O(n-4)] which is close to, but not the same as, the Huntington-Hill formula y=√((n+1)n). This new formula is a little more favoring to large states than Huntington-Hill (which is good because both computer simulations and historical evidence indicate HH is biased in favor of small states) and like HH it automatically naturally provides a 1-seat minimum. (Our simpler proposal does not provide a 1-seat minimum, albeit one could be artificially imposed and then our simpler proposal would still be valid and unbiased if used with a different value of the magic constant d – albeit now only unbiased for all states en masse and not for each state-subset in each quota-interval [n, n+1].)

Here the "exponential integrals" Eim(z) for m=1,2,3,... are defined in the halfplane Re(z)>0 by

Eim(z) = ∫1≤u<∞ exp(-u z) u-m du.

2. If we aim to null out the expected additive adjustment in a state's seat count, divided by the state's population, then we ask for y so that

A≤x≤y (1-A/x) exp(-Kx) dx = ∫y≤x≤B (B/x-1) exp(-Kx) dx
then we get a nasty transcendental equation involving "exponential integrals" (higher transcendental functions) to solve numerically for y. Specifically the equation is
(A-B) K Ei1(Ky) + exp(-KA) - exp(-KB) - AK Ei1(AK) + BK Ei1(BK) = 0.
Here again we assume A and B are consecutive integers A=n, B=n+1, and as usual K=#states/#seats to make the exponential density have the right expectation value. There is no close-form solution to that, but in the limit K→0+, the equation simplifies to
y-A - A ln(y/A) = B ln(B/y) + y-B;
with solution
y = (1/e) BB/AA,
a nice formula first written down by Michael Ossipoff (and please use limA→0+AA=1). Ossipoff called this formula the "bias-free method." In this formula, Euler's constant "e" (≈2.71828) plays a very nice role. The point is that A and B are consecutive integers A=n, B=n+1 and the limit behavior as n→∞ of
[n + 1/2 - 1/(24n) + 1/(48n²) - 73/(5760n³) + O(n-4)] e
So the e is exactly what we need to make the Ossipoff breakpoint y tend to the Webster breakpoint y=n+½.

3. If we aim to null out the expected additive adjustment in a state's population/seats ratio (i.e. in its "district size"), then we ask for y so that

A≤x≤y (x/A-1)exp(-Kx) dx = ∫y≤x≤B (1-x/B)exp(-Kx) dx
then we get that y solves this equation (which needs to be solved numerically or in terms of the "Lambert W function"):
B exp(-KA) - A exp(-KB) + (A-B) exp(-Ky) (1+Ky) = 0
Here again we shall assume A and B are consecutive integers A=n, B=n+1, and as usual K=#states/#seats to make the exponential density have the right expectation value. There is no close-form solution to that, but in the limit K→0+, the equation simplifies to
B exp(-KA) - A exp(-KB) + (A-B) (1-Ky) (1+Ky) = O(K2)
with limit-solution (using the fact B-A=1; this solution actually then solves the original exact equation even with O(K3) as the right hand side)
y = √(AB),
which is the Huntington-Hill apportionment method! And note
y = ([n+1]n)1/2 = n + 1/2 - 1/(8n) + 1/(16n²) - O(n-3)

4. If we aim to null out the expected additive adjustment in a state's population/seats ratio (i.e. in its "district size") times its population, then we ask for y so that

A≤x≤y x(x/A-1)exp(-Kx) dx = ∫y≤x≤B x(1-x/B)exp(-Kx) dx
then we get that y solves this equation (which needs to be solved numerically):
2Bexp(-KA) - 2Aexp(-KB) + BAK[exp(-KA)-exp(-KB)] + 2(A-B)exp(-Ky) + 2(A-B)exp(-Ky)Ky + (A-B)exp(-Ky)(Ky)2 = 0
Here again we shall assume A and B are consecutive integers A=n, B=n+1, and as usual K=#states/#seats to make the exponential density have the right expectation value. There is no close-form solution to that, but in the limit K→0+, the equation simplifies to
(B+A)BA - 2 y3 = O(K4)
with solution
y = (AB[B+A]/2)1/3,
which is the geometric mean of A, B, and their arithmetic mean and thus in a sense is intermediate between the Webster and Huntington-Hill methods. Its asymptotics are
y = [n(n+1)(n+½)]1/3 = n + 1/2 - 1/(12n) + 1/(24n²) - 1/(36n³) + O(n-4).
which agrees with a previous asymptotic series (for alternative method 1) up to the n-2 term but disagrees at the n-3 term.

5. If we aim to null out the expected additive adjustment in a state's log(#seats) times its population by choice of y, then

A≤x≤y [ln(x)-ln(A)] x exp(-xK) dx - ∫y≤x≤B [ln(B)-ln(x)] x exp(-xK) dx = 0
yields the equation (which needs to be solved numerically for y)
ln(A) yK exp(-yK) - ln(B) yK exp(-yK) + ln(A) exp(-yK) - ln(B) exp(-yK) + Ei1(AK) - Ei1(BK) + exp(-AK) - exp(-BK) = 0 .
In the K→0+ limit [if we replace the 0 by O(K3) and solve for y, and use the assumption that B-A=1 to simplify the result], this is
y = ((B+A)/[2ln(B/A)])1/2 = [(n+½)/ln(1+1/n)]1/2
if B=n+1 and A=n. This expression for y, for n≥0, always lies between n and n+½ and approaches the former when n→0+ and the latter when n→∞. Indeed its asymptotic behavior when n→∞ is
n + 1/2 - 1/(24n) + 1/(48n²) - 9/(640n³) + O(n-4)
which agrees with a previous asymptotic series (for Ossipoff's formula) up to the n-2 term but disagrees at the n-3 term.

6. If we aim to null out the expected additive adjustment in a state's seat count, divided by the square root of the state's population, then we ask for y so that

A≤x≤y [(x)1/2-(A/x)1/2] exp(-xK) dx - ∫y≤x≤B [(B/x)1/2-(x)1/2] exp(-xK) dx = 0.
This yields the equation (which needs to be solved numerically for y)
BK erf(√K √y) -AK erf(√K √y) +(AK/π)1/2exp(-AK) -(BK/π)1/2exp(-BK)
+erf(√K √B)/2 -erf(√K √A)/2 +AK erf(√K √A) -BK erf(√K √B) =0.
In the limit K→0+ and using the fact that B=A+1, this simplifies to
y = (4/9) (B3/2-A3/2)2
which when A=n and B=n+1 becomes
y = n + 1/2 - 1/(48n) + 1/(96n²) - 7/(1152n³) + O(n-4)
in the n→∞ limit.

4. Which one is the most "morally right"? Smith: Alternative method #2.

At the present moment it seems to me that alternative method #2 is the most "morally right" in terms of its underlying theoretical model/goals. However, that is merely my subjective impression. Here is my reasoning:

That method aims to null out the "bias" which (for that method) is defined as the expected |additive change| in the number of seats a state has (caused by the round to integer) divided by state population. Why is this "morally right"? Well, it seems that the unfairness, for an individual person in state k, of having X times as many congressmen than he ideally should, is

At first I thought that X=1/10 should be equally unfair as X=10 and hence was considering using unfairness formulas involving |log(X)|. However, I later thought X=10, X=100, etc keeps getting more and more unfair proportionally roughly to X, while X decreasing X=0.1, X=0.01 etc does not get much more unfair, and when X=0 it is essentially the same badness as X=0.1, whereas X=100 really is about 10 times worse than X=10.

Now we have to weight this unfairness per person, by the number of people in the state, which means we multiply by Pk thus cancelling out the denominator to get |X-1|.

But that is exactly what the derivation of the method nulled out, because |X-1| is proportional to the |additive change| in the number of seats, divided by state population.

5. Which one is "morally best"? Ossipoff: the original method is most moral (as well as being the simplest)

But Mike Ossipoff is (justifiably) dubious of my "quantified morality" above and currently thinks the original method is the morally best of those here. He argues against me:

You said it was about unfairness per person, which justified dividing by population. But unfairness sometimes isn't something that has to be divided among its recipients. When we're unfair to a state, all of its residents get all of the unfairness, and they don't have to divide it up... Equal representation expectation per person – shouldn't that be the goal for all of us? The original method can be derived from that goal.

It is also possible to take a compromise path "midway between" the Ossipoff and Smith views (i.e. if Ossipoff divides by 1 and Smith by x, then the compromise is to divide by √x), and the result of that would be alternative method #6. More generally one could consider division by an arbitrary power xμ of x, whereupon the resulting integrals would be expressible in terms of incomplete gamma functions.

6. Why only exponential deviates make sense as an underlying simplistic statistical model

There are many reasons for that, well known to mathematicians and statisticians and associated with such names as Dirichlet, Poisson, and Shannon. We shall explore them incompletely.

First, there is the observation of Dirichlet. Suppose you want to split up a country's total population P into N state-populations Pk, k=1,...,N (N fixed), so that ∑k=1..NPk=P and Pk≥0. This set is an (N-1)-dimensional "simplex" in N-dimensional space. The goal of choosing a point randomly from this set (and uniformly so that equal N-1-dimensional areas are equally likely) can be attained by choosing each state population to be an i.i.d. exponential random deviate, and then normalizing the N state populations so that their sum comes out right (to the desired value P). In this process, only exponential random variates work. Every other 1-dimensional density function does not work.

Second, there is the observation of Poisson. Suppose you wish to drop N points randomly and uniformly in the interval [0,N/K] of the real line, where N→∞. This is a model of "density-K random points." Then what is the probability density for the gap between two consecutive points? The answer is that it is exponential: Kexp(-xK) dx. Indeed, it is not even necessary to take the limit N→∞; we can make a very similar statement about each finite N. By that we mean that the following algorithm will generate N random uniform in [0,1] numbers X1..N, in sorted order without actually performing a sort:

s = 0;
for k=1 to N do
   s = s + ExponentialRandomDeviate;
   Xk = s;
end loop;
s = s + ExponentialRandomDeviate;
for k=1 to N do
   Xk = Xk / s;
end loop;

Our point is, that the gaps between the consecutive Xk (including the initial gap from 0 to X1 and the final one from XN to 1) generated in this way are i.i.d. exponential random deviates, aside from the rescaling at the end.

We also can get exponentials from a discrete model. Suppose we consider all the possible ways of placing P indistinguishable balls into N distinguishable boxes. How many are there? The answer is (P+N-1)!/[(N-1)!P!]. Now if all these configurations are considered equally likely, then in a limit where P→∞ and N→∞ with the average number P/N of balls per box held fixed, what is the probability distribution for the number of balls in one particular box? The answer is it tends to a (now discrete) exponential distribution on the non-negative integers, with mean P/N.

In both of the cases above, what is going on is we are splitting a total population (modeled as "indistinguishable balls" or "the real interval [0,N/K]" into N "state" populations subject only to the constraints that they be nonnegative and have the correct sum, and we are finding the state populations (in a limit) are exponentially distributed. Because in the USA P=3×108 is large, and N=50 is fairly large, the asymptotic limits are fairly well satisfied in practice, albeit the assumption the state boundaries are drawn to split up the population "randomly" is of course nonsense (but if you are going to make a simplistic probabilistic model, this is the one).

Third, and finally, there is Claude Shannon. The probability density function on the positive real axis with given mean 1/K and maximum entropy is the exponential density Kexp(-KX)dx (and similarly the discrete exponential has maximum entropy on the nonnegative integers with specified mean). Maximum entropy is a formal and quantitative version of Occam's razor. That is, this is the "simplest" assumption about a non-negative random variable with a mean, i.e. the assumption which "assumes the least."

7. Open questions

Can we now devise some pairwise optimality theorem or global optimality theorem that these methods obey? (See for such theorems concerning the classic apportionment methods as well as an introduction to them generally.)

8. Sources

The following should be useful references both for terminology and for previous results and for related history (mostly US):

M.L. Balinski & H. Peyton Young: Fair Representation: Meeting the Ideal of One Person, One Vote (2nd edition), Brookings Institution Press 2001. Young also wrote a more accessible essay summarizing his thoughts for the layman.

Lawrence R. Ernst: Apportionment methods for the House of Representatives and Court Challenges, Management Science 40,10 (Oct. 1994) 1207-1227.

Warren D. Smith: Introduction to Apportionment methods web page.

The sorted random points without sorting algorithm is from:

J.Bentley & J.Saxe: Generating Sorted Lists of Random Numbers, ACM Trans. Math'l. Software 6,3 (1980) 359-364.

For standard results on probability distributions see the 5-volume set by N.L.Johnson & S.Kotz (now with N. Balakrishnan and Adrienne W. Kemp in recent editions) J.Wiley & Sons; for combinatorics R.P.Stanley: Enumerative Combinatorics, (Brooks Wadsworth, Cole), and for information theory and maximum entropy theory, T.M. Cover & J.A. Thomas: Elements of Information Theory, Wiley 1991.

I would also like to acknowledge Dan Bishop and Michael Ossipoff. Bishop programmed a computer apportionment simulator, and Ossipoff came up with the idea of trying to seek an unbiased apportionment method, although his execution of that idea was poor due to incorrect mathematical manipulations and/or confusing expositions (and Ossipoff also did not appreciate the desirability of the exponential density as an underlying model). The present work can be viewed as simply carrying out Ossipoff's idea, but now with correct execution and a better underlying model.

Return to main page