Puzzle:
In the random election model (also called the "impartial culture")
By Warren D. Smith, January 2009
a. Consulting the answer to puzzle #99, we find that candidate A defeats rival B if X_{AA}>X_{BB}+X_{AB} where all three quantities X_{AA}, X_{BB}, and X_{AB} are independent standard normal random deviates (mean=0, variance=1). We can equivalently regard X_{BB}+X_{AB} just as a single normal random deviate with mean=0 and variance=2. Hence A defeats all N-1 rivals if 2^{-1/2}X_{AA} is greater than the maximum among N-1 independent standard normal random deviates. Letting F'(u)=(2π)^{-1/2}exp(-u^{2}/2) be the standard normal density and F(u)=[1+erf(2^{-1/2}u)]/2 be its CDF (i.e. integral), we obtain the incredibly simple formula
To get the probability any Condorcet winner exists, we simply multiply by the number N of candidates. (That is valid by symmetry – all candidates are the same – and the fact there cannot be more than one Condorcet Winner so these probabilities are summable since they are for disjoint events.) Result:
P(N) = Prob(Condorcet Winner Exists among N candidates) = N∫_{-∞<u<∞} F(2^{-1/2}u)^{N-1}F'(u)du |
Using this formula we find
which each agree with computations by other authors by other methods to all decimal places stated. Note that
is sometimes called "Gilbaud's number" since G.Th.Gilbaud stated it in a footnote in 1952. Continuing on (using numerical integration which ought to be accurate to all decimal places stated) we find
It is trivial to see from our formula that P(N)→0 as N→∞. Robert M. May claimed, in a paper rife with errors and lacking in details [Some mathematical remarks on the paradox of voting, Behavioral Science 16 (1971) 143-151] that
in the N→∞ limit. May's formula predicts
This numerical evidence seems at least somewhat compatible with the hypothesis May's formula has relative error of order 1/lnN, so he probably did not get it hugely wrong. Still, we now shall derive our own formula and find May got the right behavior but wrong constant factor.
b. We now derive our own formula for the asymptotics of P(N) in a much simpler way than May by examining our integral and noting the integrand is everywhere positive and has a "peak" which almost all of the mass lies close to. In the large-N limit this peak is asymptotically Gaussian and the integrand is exponentially small when N is large if we are ≥ε>0 away from the peak. The strategy is to evaluate the location, height, and width of the peak to estimate the value of the integral using the usual formula for integrating a Gaussian peak. Unfortunately the details are rather messy, but still appear to be far simpler than May's (full and unstated) derivation.
Using the fact [Abramowitz & Stegun: Handbook of Mathematical Functions, eq 26.2.12] that
when x→+∞, our integrand is then
for large positive u (which is the only part of the integral that matters; the rest has exponentially small effect). We will not actually need the asymptotic series; merely its first term will suffice. The u which maximizes this integrand – the location of the "peak" – is approximately the same as the u which maximizes
(which is essentially just the negated integrand's logarithm, up to relatively neglectible terms) i.e. the u which solves
i.e. approximately the same (up to relatively neglectible terms) as the u which solves
i.e.
where the "Lambert W function" W(x) is defined by exp(W)W=x. [R.M.Corless, G.H.Gonnet, D.E.G.Hare, D.J.Jeffrey, D.E.Knuth: On The Lambert W Function, Advances in Computational Mathematics 5 (1996) 329-359.] This formula for the location of the maximum seems accurate to about 1 part in 300 (or better) when N≥100.
N | Actual Peak Locn | Approx Peak Locn |
---|---|---|
10 | 1.57 | 1.4744 |
100 | 2.9990 | 2.9896 |
1000 | 4.1130 | 4.1121 |
10000 | 5.0 | 5.0315 |
10^{5} | 5.8 | 5.8258 |
10^{6} | 6.5 | 6.5339 |
Those who dislike the Lambert W function and prefer more pedestrian functions can use the fact that W(x)∼ln(x)-lnln(x)+o(1) when x→+∞ to find
but these are much less accurate and I do not recommend them.
Via complicated but straightforward algebraic manipulations fortunately made easy using the MAPLE symbolic algebra system, one may now show that the width of the peak (that is, [-1/D]^{1/2} where D is the second derivative of the logarithm of the integrand evaluated at the peak location; this for a Gaussian is just its "standard deviation") is asymptotic to [2lnN]^{-1/2} for large N. And the height of the peak (that is, the maximum value of the integrand) is asymptotic to 8e^{-2}(2π)^{1/2}N^{-2}lnN.
The usual formula for the integral of a Gaussian with height H and width W is ∫Hexp(-u^{2}/(2W^{2}))du=(2π)^{1/2}HW. Upon using this [and remembering that the whole integral needs to be multiplied by N to get P(N)], we find as our asymptotic estimate
This agrees with May's estimate except for the constant factor.
May's constant factor is
This formula predicts
In every case our formula is closer to the truth than May's formula. Anybody sufficiently determined should be able (using our same "saddlepoint method" to analyse our integral) to compute a series of successively-smaller-order correction terms to get even more asymptotic accuracy, see e.g. [C.M.Bender & S.A.Orszag: Advanced Mathematical Methods for Scientists and Engineers, McGraw-Hill 1978, reprinted Springer 1999].
One more-accurate formula is simply to use our highly-accurate formula (above, involving the Lambert W function) for ApproxPeakLocation(N); i.e. we define
and
and then
This more-accurate formula predicts
This indeed is more accurate than our preceding formula (which in turn was more accurate than May) in every case. Since it is accurate to ≤5% relative |error| for all the tabulated N, presumably it has ≤5% relative |error| for every N≥1.
c. This theorem (that the probability that a Condorcet Winner exists is less than N^{-0.99} for all sufficiently large N in the Dirichlet model) is proved as "theorem 2" in Anthony Quas: Anomalous outcomes in preferential voting (pdf), Stochastics & Dynamics 4,1 (2004) 95-105.