Sample a Random Number from a Probability Distribution in Python

Problem Formulation

Challenge: Given a list. How will you select a number randomly from the list using probability distribution?

When you select a number randomly from a list using a given probability distribution, the output number generated will be a number returned based on the relative weights (probability) of the given numbers. Let’s try to visualize this with the help of an example.

Example:

Given:
numbers = [10, 20, 30]
distributions = [0.3, 0.2, 0.5]

Expected Output: Choose the elements randomly from the given list and display 5 elements in the output list: 
[30, 10, 20, 30, 30] 

Note: The output can vary.

The expected output has the number ’30’ three times since it has the highest weight/probability. The relative weights assigned are 0.3, 0.2 and 0.5, respectively. This means:

Chances of selecting 10 are 30%.
Chances of selecting 20 are 20%.
Chances of selecting 30 are 50%.

Note: We will first have a look at the numerous ways of solving the given question and then dive into a couple of exercises for further clarity. So without further delay, let’s dive into our mission-critical question and solve it.

Quick Video Explanation:

Method 1: Using random.choices

choices() is a method of the random module in Python that returns a list containing randomly selected items from the specified sequence. This sequence can be a list, tuple, string, or any other kind of sequence.
The possibility to pick weights can be specified using the weights or the cum_weights parameter.

Syntax:
random.choices(sequence, weights=None, cum_weights=None, k=1)

Parameter	Description
sequence	– It is a mandatory parameter. – Represents a sequence like a range of numbers, a list, a tuple, etc.
weights	– It is an optional parameter. – Represents a list wherein the possibility for each value can be weighed. – By default, it is None.
cum_weights	– It is an optional parameter. – Represents a list where the possibility for each value can be weighed. However, the possibility, in this case, is accumulated. For example: normal weights: `[2, 3, 5]` is equivalent to the cum_weights: `[2, 5, 10]`. – By default, it is None.
k	– It is an optional parameter. – Represents an integer that determines the length of the returned list.

Approach: Call the random.choices() function and feed in the given list and the weights/probability distributions as parameters.

Code:

import random
numbers = [10, 20, 30]
distributions = [0.3, 0.2, 0.5]
random_number = random.choices(numbers, distributions, k=5)
print(random_number)

Output:

[10, 30, 30, 10, 20]

Caution:

If the relative or cumulative weight is not specified, then the random.choices() function will automatically select elements with equal probability.
The specified weights should always be of the same length as the specified sequence.
If you specify relative weights as well as cumulative weight at the same time, you will get a TypeError (TypeError: Cannot specify both weights and cumulative weights). Hence, to avoid the error, do not specify both at the same time.
The cum_weights or weights can only be integers, floats, and fractions. They cannot be decimals. Also, you must ensure that the weights are non-negative.

Method 2: Using numpy.random.choice

Another way to sample a random number from a probability distribution is to use the numpy.random.choice() function.

choice() is a method of the numpy.random module that allows you to generate a random value based on a numpy array. It accepts an array as a parameter and randomly returns one of the values from the array.

Syntax:
numpy.random.choice(arr, k, p)

Parameter	Description
arr	– Represents the array containing the sequence of random numbers.
k	– Represents an integer that determines the length of the returned list.
p	– Represents a list where the possibility for each value can be weighed. In simple words, it is the probability distribution of each value of the given array.

Approach: Use the numpy.random.choice(li, size, replace, weights) function such that replace is set to True to return a list of the required size from the list li with respect to a list of corresponding weight sequences weights.

Code:

import numpy as np
numbers = [10, 20, 30]
distributions = [0.3, 0.2, 0.5]
random_number = np.random.choice(numbers, 5, True, distributions)
print(random_number)

Output:

[30 20 30 10 30]

Do you want to become a NumPy master? Check out our interactive puzzle book Coffee Break NumPy and boost your data science skills! (Amazon link opens in new tab.)

Method 3: Using Scipy

Scipy is another hand library to deal with random weighted distributions.

rv_discrete is a base class that is used to construct specific distribution instances and classes for discrete random variables. It is also used to construct an arbitrary distribution defined by a list of support points and corresponding probabilities. [source: Official Documentation]

Explanation: In the following code snippet rv_discrete() takes the sequence of integer values that are contained in the list numbers as the first argument and the probability distributions/weights as the second argument and returns random values from the list based on their relative weigths/probability ditributions.

Code:

from scipy.stats import rv_discrete
numbers = [10, 20, 30]
distributions = [0.3, 0.2, 0.5]
d = rv_discrete(values=(numbers, distributions))
print(d.rvs(size=5))

Output:

[30 10 30 30 20]

Method 4: Using Lea

Another effective Python library that helps us to work with probability distributions is Lea. It is specifically designed to facilitate you to model a wide range of random phenomenons, like coin tossing, gambling, It allows you to model a broad range of random phenomenons, like dice throwing, coin tossing, gambling results, weather forecast, finance, etc.

#Note: Since lea is an external library, you must install it before using it. Here’s the command to install lea in your system: pip install lea

Code:

import lea

numbers = [10, 20, 30]
distributions = [0.3, 0.2, 0.5]
d = tuple(zip(numbers, distributions))
print(lea.pmf(d).random(5))

Output:

(30, 30, 30, 10, 20)

Exercises

Question 1: Our friend Harry has eight coloured crayons: [“red”, “green”, “blue”, “yellow”, “black”, “white”, “pink”, “orange”]. Harry has the weighted preference for selecting each color as: [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24]. He is only allowed to select three colors at once. Find the various combinations he can select in 10 attempts.

Solution:

import random
colors = ["red", "green", "blue", "yellow", "black", "white", "pink", "orange"]
distributions = [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24]
for i in range(10):
    choices = random.choices(colors, distributions, k=3)
    print(choices)

Output:

['orange', 'pink', 'green']
['blue', 'yellow', 'yellow']
['orange', 'green', 'black']
['blue', 'red', 'blue']
['orange', 'orange', 'red']
['orange', 'green', 'blue']
['orange', 'black', 'blue']
['black', 'yellow', 'green']
['pink', 'orange', 'orange']
['blue', 'blue', 'white']

Question 2:

Given:
cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"]
populations = [736000, 628000, 228000, 1450000, 409241, 1841179]

The probability of a particular city being chosen depends on its population. Thus, larger the population of a city, higher the probability of the city being chosen. Based on this condition, find the probability distribution of the cities and display the city that might be selected in 10 attempts.

Solution:

import random
cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"]
populations = [736000, 628000, 228000, 1450000, 409241, 1841179]
distributions = [round(pop / sum(populations), 2) for pop in populations]
print(distributions)
for i in range(10):
    print(random.choices(cities, distributions)[0])

Output:

[0.14, 0.12, 0.04, 0.27, 0.08, 0.35]
Freiburg
Frankfurt
Zürich
Hamburg
Stuttgart
Frankfurt
München
Frankfurt
München
München

With that we come to the end of this tutorial. I hope it has helped you. Please subscribe and stay tuned for more interesting tutorials and solutions. Happy learning! 🙂