Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Birthday Paradox, incorrect output by about 1

Tags:

python

I'm relatively new to python and wanted to test myself, by tackling the birthday problem. Rather than calculating it mathematically, I wanted to simulate it, to see if I would get the right answer. So I assigned all boolean values in the list sieve[] as False and then randomly pick a value from 0 to 364 and change it to True, if it's already True then it outputs how many times it had to iterate as an answer.

For some reason, every time I run the code, I get a value between 24.5 and 24.8

The expected result for 50% is 23 people, so why is my result 6% higher than it should be? Is there an error in my code?

import random

def howManyPeople():
    sieve = [False] * 365
    count = 1
    while True:
        newBirthday = random.randint(0,364)
        if sieve[newBirthday]:
            return count
        else:
            sieve[newBirthday] = True
            count += 1

def multipleRun():
    global timesToRun
    results = []
    for i in range(timesToRun):
        results.append(howManyPeople())
    finalResultAverage = sum(results)
    return (finalResultAverage / timesToRun)

timesToRun = int(input("How many times would you like to run this code?"))

print("Average of all solutions = " + str(multipleRun()) + " people")
like image 943
Evan Lewis Avatar asked Sep 07 '18 17:09

Evan Lewis


People also ask

Is the birthday paradox correct?

The birthday paradox is strange, counter-intuitive, and completely true. It's only a “paradox” because our brains can't handle the compounding power of exponents. We expect probabilities to be linear and only consider the scenarios we're involved in (both faulty assumptions, by the way).

Why is the birthday paradox true?

Due to probability, sometimes an event is more likely to occur than we believe it to. In this case, if you survey a random group of just 23 people there is actually about a 50–50 chance that two of them will have the same birthday. This is known as the birthday paradox.

What is the formula for the birthday paradox?

So the chance that two people don't share a birthday is (365×364)/365². Subtract that from 1 and you get what you expect: that there's a 1 in 365 chance that two people share a birthday.

What is the probability that 2 persons will have the same birthday 1 year 365 days?

The chance of: two people sharing a birthday would be 1 - (364/365), or 0.3%, or 1 in 370.


2 Answers

There's no error in your code. You're computing the mean of your sample of howManyPeople return values, when what you're really interested in (and what the birthday paradox tells you about) is the median of the distribution.

That is, you've got a random process where you incrementally add people to a set, then report the total number of people in that set on the first birthday collision. The birthday paradox implies that at least 50% of the time, your set will have 23 or fewer people. That's not the same thing as saying the expected number of people in the set is 23.0 or smaller.

Here's what I see from one million samples of your howManyPeople function.

In [4]: sample = [howManyPeople() for _ in range(1000000)]

In [5]: import numpy as np

In [6]: np.median(sample)
Out[6]: 23.0

In [7]: np.mean(sample)
Out[7]: 24.617082

In [8]: np.mean([x <= 23 for x in sample])
Out[8]: 0.506978

Note that there's a (tiny) amount of luck here: the median of the distribution of howManyPeople return values is 23 (at least according to Wikipedia's definition), but there's a chance that an unusual sample could have different median, purely through randomness. In this particular case, that chance is entirely negligible. And as user2357112 points out in comments, things are a bit messier in the 2-day year example, where any real number between 2.0 and 3.0 (inclusive) is a valid distribution median, and we could reasonably expect a sample median to be either 2 or 3.

Instead of sampling, we can also compute the probabilities of each output of howManyPeople directly: for any positive integer k, the probability that the output is strictly larger than k is the same as the probability that the first k people have distinct birthdays, which is given (in Python syntax) by factorial(365)/factorial(k)/365**k, and we can use that to compute the probabilities of individual outputs. Here I'm using the name X for the random variable represented by howManyPeople. Some inefficient code:

from math import factorial

def prob_X_greater_than(k):
    """Probability that the output of howManyPeople is > k."""
    if k <= 0:
        return 1.0
    elif k > 365:
        return 0.0
    else:
        return factorial(365) / factorial(365 - k) / 365**k

def prob_X_equals(k):
    """Probability that the output of howManyPeople is == k."""
    return prob_x_greater_than(k-1) - prob_x_greater_than(k)

With this, we can get the exact (well, okay, exact up to numerical errors) mean and verify that it roughly matches what we got from the sample:

In [18]: sum(k*prob_x_equals(k) for k in range(1, 366))
Out[18]: 24.616585894598863

And the birthday paradox in this case should tell us that the sum of the probabilities for k <= 23 is greater than 0.5:

In [19]: sum(prob_x_equals(k) for k in range(1, 24))
Out[19]: 0.5072972343239854
like image 82
Mark Dickinson Avatar answered Oct 03 '22 03:10

Mark Dickinson


What you're seeing is normal. There may be a >50% chance of having a duplicate birthday in a room of 23 random people (ignoring leap years and nonuniform birthday distributions), but that doesn't mean that if you add people to a room one by one, the mean point at which you get a duplicate will be 23.

To get an intuitive feel for this, imagine if years only had two days. In this case, it's clear that there's a 50% chance of having a duplicate birthday in a room with 2 people. However, if you add random people to the room one by one, you're going to need at least two people - 50% chance of stopping at 2 and 50% of 3. The mean stopping point is 2.5, not 2.

like image 27
user2357112 supports Monica Avatar answered Oct 03 '22 02:10

user2357112 supports Monica