RangeVoting.org - answer to puzzle 102 - Probability Condorcet Winner exists

Puzzle #102: Probability Condorcet Winner exists

Puzzle:
In the random election model (also called the "impartial culture")

Find an exact formula, expressed as a 1-dimensional integral, for the probability P(N) that a "Condorcet Winner" (i.e. a candidate who pairwise-defeats all the others by majority-vote) exists in a V-voter N-candidate election in the V→∞ limit. [Hint: it will help immensely to have solved puzzle #99 first.]
Then find a simpler asymptotic formula, valid when N→∞, not involving any integrals.
Your formula should prove that the probability that a Condorcet Winner exists is less than N^-0.99 for all sufficiently large N. Prove this is also true in the Dirichlet model.

Answers

By Warren D. Smith, January 2009

Exact Formula for P(N)

a. Consulting the answer to puzzle #99, we find that candidate A defeats rival B if X_AA>X_BB+X_AB where all three quantities X_AA, X_BB, and X_AB are independent standard normal random deviates (mean=0, variance=1). We can equivalently regard X_BB+X_AB just as a single normal random deviate with mean=0 and variance=2. Hence A defeats all N-1 rivals if 2^-1/2X_AA is greater than the maximum among N-1 independent standard normal random deviates. Letting F'(u)=(2π)^-1/2exp(-u²/2) be the standard normal density and F(u)=[1+erf(2^-1/2u)]/2 be its CDF (i.e. integral), we obtain the incredibly simple formula

Prob(A is the Condorcet Winner) = ∫_-∞<u<∞ F(2^-1/2u)^N-1F'(u)du.

To get the probability any Condorcet winner exists, we simply multiply by the number N of candidates. (That is valid by symmetry – all candidates are the same – and the fact there cannot be more than one Condorcet Winner so these probabilities are summable since they are for disjoint events.) Result:

P(N) = Prob(Condorcet Winner Exists among N candidates) = N∫_-∞<u<∞ F(2^-1/2u)^N-1F'(u)du

Using this formula we find

P(1)=P(2)=1, P(3)≈91.2260172%, P(4)≈82.4520344%, P(5)≈74.8688265%, P(6)≈68.4763934%, P(7)≈63.0816461%, P(8)≈58.4914955%, P(9)≈54.5473641%

which each agree with computations by other authors by other methods to all decimal places stated. Note that

1-P(3) = [1-P(4)]/2 = [3arcsec(3)-π]/(2π) ≈ 0.087739828045910905

is sometimes called "Gilbaud's number" since G.Th.Gilbaud stated it in a footnote in 1952. Continuing on (using numerical integration which ought to be accurate to all decimal places stated) we find

P(10)≈51.125186%, P(100)≈8.8136522%, P(1000)≈1.1480128%, P(10000)≈0.1365%, P(10⁵)≈0.01554%, P(10⁶)≈0.001723%.

It is trivial to see from our formula that P(N)→0 as N→∞. Robert M. May claimed, in a paper rife with errors and lacking in details [Some mathematical remarks on the paradox of voting, Behavioral Science 16 (1971) 143-151] that

P(N) ∼ [8πlnN]^1/2/N

in the N→∞ limit. May's formula predicts

P(10)∼76.07%, P(100)∼10.758%, P(1000)∼1.3176%, P(10000)∼0.1521%, P(10⁵)∼0.01701%, P(10⁶)∼0.001863%.

This numerical evidence seems at least somewhat compatible with the hypothesis May's formula has relative error of order 1/lnN, so he probably did not get it hugely wrong. Still, we now shall derive our own formula and find May got the right behavior but wrong constant factor.

Asymptotic behavior of P(N)

b. We now derive our own formula for the asymptotics of P(N) in a much simpler way than May by examining our integral and noting the integrand is everywhere positive and has a "peak" which almost all of the mass lies close to. In the large-N limit this peak is asymptotically Gaussian and the integrand is exponentially small when N is large if we are ≥ε>0 away from the peak. The strategy is to evaluate the location, height, and width of the peak to estimate the value of the integral using the usual formula for integrating a Gaussian peak. Unfortunately the details are rather messy, but still appear to be far simpler than May's (full and unstated) derivation.

Using the fact [Abramowitz & Stegun: Handbook of Mathematical Functions, eq 26.2.12] that

1-F(x) ∼ F'(x)/x · [1 - x^-2 + 3x^-4 - 15x^-6 + 105x^-8 -...]

when x→+∞, our integrand is then

(2π)^-1/2 [1 - u^-1π^-1/2exp(-u²/4) (1 - 2u^-2 + 2²3u^-4 - 2³15u^-6 +...) ]^N-1 exp(-u²/2) du

for large positive u (which is the only part of the integral that matters; the rest has exponentially small effect). We will not actually need the asymptotic series; merely its first term will suffice. The u which maximizes this integrand – the location of the "peak" – is approximately the same as the u which maximizes

(N-1)u^-1π^-1/2exp(-u²/4) + u²/2 + ln(2π)/2

(which is essentially just the negated integrand's logarithm, up to relatively neglectible terms) i.e. the u which solves

u = π^-1/2(N-1)[2^-1+u^-2]exp(-u²/4)

i.e. approximately the same (up to relatively neglectible terms) as the u which solves

2 π^1/2 u = (N-1) exp(-u²/4)

i.e.

u_peak ≈ ApproxPeakLocation(N) = π^-1/2(N-1)/2 · exp( -W((N-1)²/(8π))/2 )

where the "Lambert W function" W(x) is defined by exp(W)W=x. [R.M.Corless, G.H.Gonnet, D.E.G.Hare, D.J.Jeffrey, D.E.Knuth: On The Lambert W Function, Advances in Computational Mathematics 5 (1996) 329-359.] This formula for the location of the maximum seems accurate to about 1 part in 300 (or better) when N≥100.

Actual vs Approximate peak locations.
N	Actual Peak Locn	Approx Peak Locn
10	1.57	1.4744
100	2.9990	2.9896
1000	4.1130	4.1121
10000	5.0	5.0315
10⁵	5.8	5.8258
10⁶	6.5	6.5339

Those who dislike the Lambert W function and prefer more pedestrian functions can use the fact that W(x)∼ln(x)-lnln(x)+o(1) when x→+∞ to find

u_peak ≈ 2^1/2 [2ln(N-1)-ln(8π)]^1/2 + o(1) ∼ 2(lnN)^1/2

but these are much less accurate and I do not recommend them.

Via complicated but straightforward algebraic manipulations fortunately made easy using the MAPLE symbolic algebra system, one may now show that the width of the peak (that is, [-1/D]^1/2 where D is the second derivative of the logarithm of the integrand evaluated at the peak location; this for a Gaussian is just its "standard deviation") is asymptotic to [2lnN]^-1/2 for large N. And the height of the peak (that is, the maximum value of the integrand) is asymptotic to 8e^-2(2π)^1/2N^-2lnN.

The usual formula for the integral of a Gaussian with height H and width W is ∫Hexp(-u²/(2W²))du=(2π)^1/2HW. Upon using this [and remembering that the whole integral needs to be multiplied by N to get P(N)], we find as our asymptotic estimate

P(N) ∼ 2^1/28πe^-2 (lnN)^1/2/N

This agrees with May's estimate except for the constant factor. May's constant factor is (8π)^1/2≈5.01326 whereas ours is 2^1/28πe^-2≈4.81023.

This formula predicts

P(10)∼72.99%, P(100)∼10.323%, P(1000)∼1.2643%, P(10000)∼0.1460%, P(10⁵)∼0.01632%, P(10⁶)∼0.001788%.

In every case our formula is closer to the truth than May's formula. Anybody sufficiently determined should be able (using our same "saddlepoint method" to analyse our integral) to compute a series of successively-smaller-order correction terms to get even more asymptotic accuracy, see e.g. [C.M.Bender & S.A.Orszag: Advanced Mathematical Methods for Scientists and Engineers, McGraw-Hill 1978, reprinted Springer 1999].

One more-accurate formula is simply to use our highly-accurate formula (above, involving the Lambert W function) for ApproxPeakLocation(N); i.e. we define

Hgt(A) = F(2^-1/2A)^N-1F'(A)

and

Wid(A) = [-(d²/dA²) ln Hgt(A)]^-1/2 =
= [(N-1)F'(A)²F'(A/√2)² - (N-1)F'(A)²F''(A/√2)F(A/√2) - 2F'(A)F'''(A)F(A/√2)² + 2F''(A)²F(A/√2)²]^-1/2 2^1/2 F'(A/√2)F(A)

and then

P(N) ≈ (2π)^1/2 N Hgt(ApproxPeakLocation(N)) Wid(ApproxPeakLocation(N)).

This more-accurate formula predicts

P(1)∼100%, P(2)∼97.6613%, P(3)∼87.3804%, P(4)∼78.4376%, P(5)∼71.1734%, P(6)∼65.1984%, P(7)∼60.1948%,
P(10)∼49.086%, P(100)∼8.610%, P(1000)∼1.1195%, P(10000)∼0.1327%, P(10⁵)∼0.01507%, P(10⁶)∼0.001669%.

This indeed is more accurate than our preceding formula (which in turn was more accurate than May) in every case. Since it is accurate to ≤5% relative |error| for all the tabulated N, presumably it has ≤5% relative |error| for every N≥1.

c. This theorem (that the probability that a Condorcet Winner exists is less than N^-0.99 for all sufficiently large N in the Dirichlet model) is proved as "theorem 2" in Anthony Quas: Anomalous outcomes in preferential voting (pdf), Stochastics & Dynamics 4,1 (2004) 95-105.

Return to puzzles

Return to main page