Challenge: Given a list. How will you select a number randomly from the list using probability distribution?
When you select a number randomly from a list using a given probability distribution, the output number generated will be a number returned based on the relative weights (probability) of the given numbers. Let’s try to visualize this with the help of an example.
Given: numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] Expected Output: Choose the elements randomly from the given list and display 5 elements in the output list: [30, 10, 20, 30, 30] Note: The output can vary.
The expected output has the number ’30’ three times since it has the highest weight/probability. The relative weights assigned are 0.3, 0.2 and 0.5, respectively. This means:
- Chances of selecting 10 are 30%.
- Chances of selecting 20 are 20%.
- Chances of selecting 30 are 50%.
Note: We will first have a look at the numerous ways of solving the given question and then dive into a couple of exercises for further clarity. So without further delay, let’s dive into our mission-critical question and solve it.
Quick Video Explanation:
Method 1: Using random.choices
choices()is a method of the
randommodule in Python that returns a list containing randomly selected items from the specified sequence. This sequence can be a list, tuple, string, or any other kind of sequence.
- The possibility to pick weights can be specified using the
random.choices(sequence, weights=None, cum_weights=None, k=1)
|sequence||– It is a mandatory parameter. |
– Represents a sequence like a range of numbers, a list, a tuple, etc.
|weights||– It is an optional parameter.|
– Represents a list wherein the possibility for each value can be weighed.
– By default, it is None.
|cum_weights||– It is an optional parameter.|
– Represents a list where the possibility for each value can be weighed. However, the possibility, in this case, is accumulated. For example: normal weights:
– By default, it is None.
|k||– It is an optional parameter.|
– Represents an integer that determines the length of the returned list.
Approach: Call the
random.choices() function and feed in the given list and the weights/probability distributions as parameters.
import random numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] random_number = random.choices(numbers, distributions, k=5) print(random_number)
[10, 30, 30, 10, 20]
- If the relative or cumulative weight is not specified, then the
random.choices()function will automatically select elements with equal probability.
- The specified weights should always be of the same length as the specified sequence.
- If you specify relative weights as well as cumulative weight at the same time, you will get a TypeError (
TypeError: Cannot specify both weights and cumulative weights). Hence, to avoid the error, do not specify both at the same time.
weightscan only be integers, floats, and fractions. They cannot be decimals. Also, you must ensure that the weights are non-negative.
Method 2: Using numpy.random.choice
Another way to sample a random number from a probability distribution is to use the
choice() is a method of the
numpy.random module that allows you to generate a random value based on a numpy array. It accepts an array as a parameter and randomly returns one of the values from the array.
Syntax: numpy.random.choice(arr, k, p)
|arr||– Represents the array containing the sequence of random numbers.|
|k||– Represents an integer that determines the length of the returned list.|
|p||– Represents a list where the possibility for each value can be weighed. In simple words, it is the probability distribution of each value of the given array.|
Approach: Use the
numpy.random.choice(li, size, replace, weights) function such that
replace is set to
True to return a list of the required
size from the list
li with respect to a list of corresponding weight sequences
import numpy as np numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] random_number = np.random.choice(numbers, 5, True, distributions) print(random_number)
[30 20 30 10 30]
Do you want to become a NumPy master? Check out our interactive puzzle book Coffee Break NumPy and boost your data science skills! (Amazon link opens in new tab.)
Method 3: Using Scipy
Scipy is another hand library to deal with random weighted distributions.
rv_discreteis a base class that is used to construct specific distribution instances and classes for discrete random variables. It is also used to construct an arbitrary distribution defined by a list of support points and corresponding probabilities. [source: Official Documentation]
Explanation: In the following code snippet
rv_discrete() takes the sequence of integer values that are contained in the list
numbers as the first argument and the probability distributions/weights as the second argument and returns random values from the list based on their relative weigths/probability ditributions.
from scipy.stats import rv_discrete numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] d = rv_discrete(values=(numbers, distributions)) print(d.rvs(size=5))
[30 10 30 30 20]
Method 4: Using Lea
Another effective Python library that helps us to work with probability distributions is Lea. It is specifically designed to facilitate you to model a wide range of random phenomenons, like coin tossing, gambling, It allows you to model a broad range of random phenomenons, like dice throwing, coin tossing, gambling results, weather forecast, finance, etc.
lea is an external library, you must install it before using it. Here’s the command to install
lea in your system:
pip install lea
import lea numbers = [10, 20, 30] distributions = [0.3, 0.2, 0.5] d = tuple(zip(numbers, distributions)) print(lea.pmf(d).random(5))
(30, 30, 30, 10, 20)
Question 1: Our friend Harry has eight coloured crayons: [“red”, “green”, “blue”, “yellow”, “black”, “white”, “pink”, “orange”]. Harry has the weighted preference for selecting each color as: [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24]. He is only allowed to select three colors at once. Find the various combinations he can select in 10 attempts.
import random colors = ["red", "green", "blue", "yellow", "black", "white", "pink", "orange"] distributions = [1/24, 1/6, 1/6, 1/12, 1/12, 1/24, 1/8, 7/24] for i in range(10): choices = random.choices(colors, distributions, k=3) print(choices)
['orange', 'pink', 'green'] ['blue', 'yellow', 'yellow'] ['orange', 'green', 'black'] ['blue', 'red', 'blue'] ['orange', 'orange', 'red'] ['orange', 'green', 'blue'] ['orange', 'black', 'blue'] ['black', 'yellow', 'green'] ['pink', 'orange', 'orange'] ['blue', 'blue', 'white']
Given: cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"] populations = [736000, 628000, 228000, 1450000, 409241, 1841179] The probability of a particular city being chosen depends on its population. Thus, larger the population of a city, higher the probability of the city being chosen. Based on this condition, find the probability distribution of the cities and display the city that might be selected in 10 attempts.
import random cities = ["Frankfurt", "Stuttgart", "Freiburg", "München", "Zürich", "Hamburg"] populations = [736000, 628000, 228000, 1450000, 409241, 1841179] distributions = [round(pop / sum(populations), 2) for pop in populations] print(distributions) for i in range(10): print(random.choices(cities, distributions))
[0.14, 0.12, 0.04, 0.27, 0.08, 0.35] Freiburg Frankfurt Zürich Hamburg Stuttgart Frankfurt München Frankfurt München München
With that we come to the end of this tutorial. I hope it has helped you. Please subscribe and stay tuned for more interesting tutorials and solutions. Happy learning! 🙂
Recommended Read: Python’s Random Module – Everything You Need to Know to Get Started