Problem Formulation
Challenge: Given a list. How will you select a number randomly from the list using probability distribution?
When you select a number randomly from a list using a given probability distribution, the output number generated will be a number returned based on the relative weights (probability) of the given numbers. Let’s try to visualize this with the help of an example.
Example:
Given: numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] Expected Output: Choose the elements randomly from the given list and display 5 elements in the output list: [30, 10, 20, 30, 30] Note: The output can vary.
The expected output has the number ’30’ three times since it has the highest weight/probability. The relative weights assigned are 0.3, 0.2 and 0.5, respectively. This means:
- Chances of selecting 10 are 30%.
- Chances of selecting 20 are 20%.
- Chances of selecting 30 are 50%.
Note: We will first have a look at the numerous ways of solving the given question and then dive into a couple of exercises for further clarity. So without further delay, let’s dive into our mission-critical question and solve it.
Quick Video Explanation:
Method 1: Using random.choices
choices()
is a method of therandom
module in Python that returns a list containing randomly selected items from the specified sequence. This sequence can be a list, tuple, string, or any other kind of sequence.- The possibility to pick weights can be specified using the
weights
or thecum_weights
parameter.
Syntax:
random.choices(sequence, weights=None, cum_weights=None, k=1)
Parameter | Description |
---|---|
sequence | – It is a mandatory parameter. – Represents a sequence like a range of numbers, a list, a tuple, etc. |
weights | – It is an optional parameter. – Represents a list wherein the possibility for each value can be weighed. – By default, it is None. |
cum_weights | – It is an optional parameter. – Represents a list where the possibility for each value can be weighed. However, the possibility, in this case, is accumulated. For example: normal weights: [2, 3, 5] is equivalent to the cum_weights: [2, 5, 10] .– By default, it is None. |
k | – It is an optional parameter. – Represents an integer that determines the length of the returned list. |
Approach: Call the random.choices()
function and feed in the given list and the weights/probability distributions as parameters.
Code:
import random numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] random_number = random.choices(numbers, distributions, k=5) print(random_number)
Output:
[10, 30, 30, 10, 20]
Caution:
- If the relative or cumulative weight is not specified, then the
random.choices()
function will automatically select elements with equal probability. - The specified weights should always be of the same length as the specified sequence.
- If you specify relative weights as well as cumulative weight at the same time, you will get a TypeError (
TypeError: Cannot specify both weights and cumulative weights
). Hence, to avoid the error, do not specify both at the same time. - The
cum_weights
orweights
can only be integers, floats, and fractions. They cannot be decimals. Also, you must ensure that the weights are non-negative.
Method 2: Using numpy.random.choice
Another way to sample a random number from a probability distribution is to use the numpy.random.choice()
function.
choice()
is a method of the numpy.random
module that allows you to generate a random value based on a numpy array. It accepts an array as a parameter and randomly returns one of the values from the array.
Syntax: numpy.random.choice(arr, k, p)
Parameter | Description |
---|---|
arr | – Represents the array containing the sequence of random numbers. |
k | – Represents an integer that determines the length of the returned list. |
p | – Represents a list where the possibility for each value can be weighed. In simple words, it is the probability distribution of each value of the given array. |
Approach: Use the numpy.random.choice(li, size, replace, weights)
function such that replace
is set to True
to return a list of the required size
from the list li
with respect to a list of corresponding weight sequences weights
.
Code:
import numpy as np numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] random_number = np.random.choice(numbers, 5, True, distributions) print(random_number)
Output:
[30 20 30 10 30]
Do you want to become a NumPy master? Check out our interactive puzzle book Coffee Break NumPy and boost your data science skills! (Amazon link opens in new tab.)
Method 3: Using Scipy
Scipy
is another hand library to deal with random weighted distributions.
rv_discrete
is a base class that is used to construct specific distribution instances and classes for discrete random variables. It is also used to construct an arbitrary distribution defined by a list of support points and corresponding probabilities. [source: Official Documentation]
Explanation: In the following code snippet rv_discrete()
takes the sequence of integer values that are contained in the list numbers
as the first argument and the probability distributions/weights as the second argument and returns random values from the list based on their relative weigths/probability ditributions.
Code:
from scipy.stats import rv_discrete numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] d = rv_discrete(values=(numbers, distributions)) print(d.rvs(size=5))
Output:
[30 10 30 30 20]
Method 4: Using Lea
Another effective Python library that helps us to work with probability distributions is Lea. It is specifically designed to facilitate you to model a wide range of random phenomenons, like coin tossing, gambling, It allows you to model a broad range of random phenomenons, like dice throwing, coin tossing, gambling results, weather forecast, finance, etc.
#Note: Since lea
is an external library, you must install it before using it. Here’s the command to install lea
in your system: pip install lea
Code:
import lea numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] d = tuple(zip(numbers, distributions)) print(lea.pmf(d).random(5))
Output:
(30, 30, 30, 10, 20)
Exercises
Question 1: Our friend Harry has eight coloured crayons: [“red”, “green”, “blue”, “yellow”, “black”, “white”, “pink”, “orange”]. Harry has the weighted preference for selecting each color as: [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24]. He is only allowed to select three colors at once. Find the various combinations he can select in 10 attempts.
Solution:
import random colors = ["red", "green", "blue", "yellow", "black", "white", "pink", "orange"] distributions = [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24] for i in range(10): choices = random.choices(colors, distributions, k=3) print(choices)
Output:
['orange', 'pink', 'green']
['blue', 'yellow', 'yellow']
['orange', 'green', 'black']
['blue', 'red', 'blue']
['orange', 'orange', 'red']
['orange', 'green', 'blue']
['orange', 'black', 'blue']
['black', 'yellow', 'green']
['pink', 'orange', 'orange']
['blue', 'blue', 'white']
Question 2:
Given: cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"] populations = [736000, 628000, 228000, 1450000, 409241, 1841179] The probability of a particular city being chosen depends on its population. Thus, larger the population of a city, higher the probability of the city being chosen. Based on this condition, find the probability distribution of the cities and display the city that might be selected in 10 attempts.
Solution:
import random cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"] populations = [736000, 628000, 228000, 1450000, 409241, 1841179] distributions = [round(pop / sum(populations), 2) for pop in populations] print(distributions) for i in range(10): print(random.choices(cities, distributions)[0])
Output:
[0.14, 0.12, 0.04, 0.27, 0.08, 0.35]
Freiburg
Frankfurt
Zürich
Hamburg
Stuttgart
Frankfurt
München
Frankfurt
München
München
With that we come to the end of this tutorial. I hope it has helped you. Please subscribe and stay tuned for more interesting tutorials and solutions. Happy learning! 🙂
Recommended Read: Python’s Random Module – Everything You Need to Know to Get Started