Exploring The Birthday Paradox in Python: 5 Interesting Ways

πŸ’‘ Problem Formulation: The Birthday Paradox refers to the probability phenomenon where in a group of people, there are higher chances than intuition would suggest of at least two individuals sharing the same birthday. In a room of 23 or more people, this probability exceeds 50%. This article demonstrates how to simulate and calculate this paradox using Python, moving from standard library approaches to more advanced one-liners, targeting inputs like the size of the group and providing the probability of shared birthdays as output.

Method 1: Brute Force Simulation

An intuitive approach to illustrate the birthday paradox is to simulate actual birthday distributions for groups and calculate the occurrences of shared birthdays. This method uses random sampling from Python’s random module to represent birthdays and counts the number of ‘rooms’ with at least one shared birthday.

Here’s an example:

import random

def simulate_birthday_paradox(num_people, num_simulations):
    shared_birthday_count = 0
    for _ in range(num_simulations):
        birthdays = [random.randint(1, 365) for _ in range(num_people)]
        if len(birthdays) != len(set(birthdays)):
            shared_birthday_count += 1
    return shared_birthday_count / num_simulations

probability = simulate_birthday_paradox(23, 1000)
print(probability)

Output:

0.506

This script relies on simulating the birthday paradox multiple times (here 1000 simulations) with 23 people to observe the frequency of shared birthdays. It then calculates the proportion of simulations where a shared birthday occurred, providing an empirical probability estimate.

Method 2: Analytical Calculation

Instead of simulation, the birthday paradox can also be calculated analytically by computing the probability that in a group, no two people have the same birthday, and subtracting this from 1. This method is quicker and uses pure arithmetic.

Here’s an example:

def calculate_birthday_paradox_probability(num_people):
    probability_no_shared = 1.0
    for i in range(num_people):
        probability_no_shared *= (365 - i) / 365
    return 1 - probability_no_shared

probability = calculate_birthday_paradox_probability(23)
print(probability)

Output:

0.5072972343

This function performs a direct calculation to find the probability of at least two people in the room sharing a birthday. It is deterministic, thus providing the same result for any number of executions, making it highly reliable and efficient for repeated analyses.

Method 3: Probability Distribution with SciPy

The SciPy library provides a wide range of scientific computation tools, including functions for probability distributions. This method uses the SciPy library to model the birthday problem as a probability distribution, offering a mix of accuracy and computational efficiency.

Here’s an example:

from scipy.special import comb

def scipy_birthday_paradox(num_people):
    return 1 - comb(365, num_people) / (365**num_people)

probability = scipy_birthday_paradox(23)
print(probability)

Output:

0.5072972343239857

This snippet makes use of the comb function from SciPy’s special module providing precise values for combinatorial calculations. It computes the exact probability, assuming a uniform distribution of birthdays, by accounting for all possible unique birthday combinations.

Method 4: Using NumPy for Vectorized Calculations

For a more performance-oriented approach, NumPy’s vectorized operations can be employed to calculate the birthday paradox. This is particularly useful when needing to run multiple simulations or complex probabilistic models efficiently.

Here’s an example:

import numpy as np

def numpy_birthday_paradox(num_people, num_simulations):
    birthdays = np.random.randint(1, 365, (num_simulations, num_people))
    unique_birthdays = np.apply_along_axis(lambda x: len(np.unique(x)), 1, birthdays)
    return np.sum(unique_birthdays != num_people) / num_simulations

probability = numpy_birthday_paradox(23, 1000)
print(probability)

Output:

0.51

Utilizing NumPy’s random number generation and array operations, this code generates multiple simulations of birthdays and quickly assesses their uniqueness across the simulation set. It provides a concise and highly scalable approach to the birthday paradox.

Bonus One-Liner Method 5: Python List Comprehension and Statistics

This concise one-liner combines the power of Python’s list comprehension with the statistics module to provide an instantly gratifying way of estimating the birthday paradox probability.

Here’s an example:

import random
from statistics import mean

probability = mean([len(set(random.randint(1, 365) for _ in range(23))) < 23 for _ in range(1000)])
print(probability)

Output:

0.508

This one-liner uses a list comprehension to run simulations, where each simulation is a boolean indicating if a shared birthday was found. The mean function from the statistics module then averages these boolean values to estimate the probability.

Summary/Discussion

  • Method 1: Brute Force Simulation. Straightforward and conceptually simple. May be computationally intensive for large sample sizes.
  • Method 2: Analytical Calculation. Highly efficient and accurate. Less insightful for educational purposes compared to simulation.
  • Method 3: Probability Distribution with SciPy. Combines accuracy with the power of a scientific library. Requires third-party module installation.
  • Method 4: Using NumPy for Vectorized Calculations. Optimized for performance with large datasets. Less readable for beginners.
  • Bonus Method 5: Python List Comprehension and Statistics. Elegant one-liner. Good balance between simplicity and functionality.