How I Solved a Real-World Problem Using Monte Carlo Simulation

In this project, you are taking the role of a data analyst. Your client, a large retail outlet has been running an affiliate marketing program in an effort to increase its sales. You are tasked to come up with a predictive model to predict how much should be budgeted for sales commission for the following year.

This is one of the projects you will likely come across as a data analyst. How do you go about solving this problem? No doubt, you will ask your client for more information. For example, data showing previous years’ commission payments, the number of sales representatives, and the company’s defined formula for calculating commissions.

For this project, we assumed that the only information at our disposal is a rate schedule showing the company’s formula for calculating commissions.

Rates Schedule

Sales ($)Rates
1,000 – 100,0003%
101,000 – 500,0005%
501,000 – 1,000,00010%

The table shows that the more sales, the higher the commission rate. Since we were neither given sales historical data nor the number of sales representatives, we are going to make a few assumptions.

Basic Assumptions

We will generate  random sales data based on the following assumptions which are also based on the information from the rates schedule:

  1. Sales made cannot be below 1000 dollars.
  2. Sales made are in thousands.
  3. The company employed 500 persons for the affiliate marketing program.
  4. Sales made by each individual cannot be more than 1 million dollars in a given year.

With these assumptions, we are ready to generate random data.

Python’s random module is used to generate random numbers. We will use the sample() method which takes in a range of numbers and generate random data based on the length defined.

The parameters for the range() function are given as: range(start, stop, step) where:

  1. start: The starting point. In our case, it is 1000 as we assumed that the sales amount cannot be below 1000 dollars.
  2. stop: The ending point. In our case, it is 1 million. We assumed that it won’t go beyond that.
  3. step: The number to skip to ensure that the data end with three zeros as given in the rates schedule.

The sample() method takes another parameter that indicates the length of the random numbers. Remember that each randomly generated number represents the sales made by each individual. So, the length will be 500 to indicate the number of workforces employed by the company.

import random

sales = random.sample(range(1000, 1000000, 1000), 500)

Let’s load the data on a Pandas DataFrame.

import pandas as pd

data = pd.DataFrame({'Sales': sales})

data
0    986000
1    451000
2    487000
3    982000
4    194000
..      ...
495  761000
496  615000
497  786000
498  952000
499   82000

[500 rows x 1 columns]

data.Sales.min()
 2000

data.Sales.max()
986000

Now that we have our sales data. The next thing we have to do is to calculate the commission. The commission amount on the other hand is based on the rates. So, we need a way to assign rates for each sales representative based on sales made. A defined function will accomplish that.

def commissionRates(sales):
    if sales <= 100000:
        return .03
    elif sales <= 500000:
        return .05
    else:
        return .1

With this function defined, we can easily create the rates column using Pandas’ apply function.

data['Rates'] = data.Sales.apply(commissionRates)
data.head(10)

       Sales   Rates
0  986000   0.10
1  451000   0.05
2  487000   0.05
3  982000   0.10
4  194000   0.05
5  305000   0.05
6  154000   0.05
7  768000   0.10
8  685000   0.10
9  885000   0.10

The rates are applied according to the rates schedule given to us. To get the commission amount, it’s a matter of multiplying the rates by the sales amount.

data['Commission'] = data.Sales * data.Rates
data.tail(10)

          Sales     Rates  Commission
490  149000   0.05      7450.0
491  793000   0.10     79300.0
492  205000   0.05     10250.0
493  894000   0.10     89400.0
494  276000   0.05     13800.0
495  761000   0.10     76100.0
496  615000   0.10     61500.0
497  786000   0.10     78600.0
498  952000   0.10     95200.0
499   82000   0.03      2460.0

We now have the total commission amount of $ 21,831,140 to be paid to sales representatives.

Monte Carlo Simulation

Monte Carlo analysis is a useful tool for making predictions. It takes as an input, different random numbers and runs them several times to generate many probable outcomes. How do we apply it in our case?

We want to predict the commission amount to be budgeted for the coming year. The amount we realized was based on one single attempt or iteration. We can’t just conclude that this is the ideal amount until we run it several times up to 10,000 or more using different randomly generated data.

def simulate_commission(iterations):
    results = []
    for i in range(iterations):
    sales = random.sample(range(1000, 1000000, 1000), 500)
    data = pd.DataFrame({'Sales': sales})
    data['Rates'] = data.Sales.apply(commissionRates)
    data['Commission'] = data.Sales * data.Rates
    results.append(
        [
        data.Sales.sum().round(),
        data.Commission.sum().round(),
        ]
    )
    
    return results

We create a function that will perform the simulation based on the number of times chosen. We can try 1,000, 10,000 or even 50,000 times. In the end, we sum the sales and the commissions to be used for further analysis.

Let’s give it a try with 5000 simulations and load it on a Pandas DataFrame.

results = simulate_commission(5000)
results = pd.DataFrame.from_records(results, columns=[‘Sales’, ‘Commission’])

results
          Sales        Commission
0     256137000  22305150.0
1     252466000  21865310.0
2     246516000  21478230.0
3     250821000  21669860.0
4     252750000  22314280.0
...         ...         ...
4995  253876000  22073200.0
4996  245326000  21089810.0
4997  248717000  21651090.0
4998  246074000  21105970.0
4999  251908000  21810780.0

results.describe().round()
                  
                  Sales   Commission
count       5000.0      5000.0
mean   250092538.0  21826489.0
std      4526122.0    523127.0
min    229959000.0  19379090.0
25%    247034250.0  21476432.0
50%    250058000.0  21816840.0
75%    253203250.0  22188120.0
max    266212000.0  23649430.0

After running 5000 simulations, the results show that the commission amount is between $19M and $23M with a mean of $21M. This is within our previous results. Let’s visualize this on a histogram.

def currency(x, pos):
    if x >= 1e6:
        s = '${:1.1f}M'.format(x*1e-6)
    else:
        s = '${:1.0f}K'.format(x*1e-3)
    return s

Here, we are setting the function that will format the commission amount as it’s too large to be fitted on the chart. It will take the amount and the tick position as parameters. We will feed it into the xaxis.set_major_formatter() method to format the string.

import matplotlib.pyplot

fig, ax = plt.subplots()

ax.xaxis.set_major_formatter(currency)

results.Commission.plot(kind='hist', title='Commission Amount')

plt.grid()
plt.show()

Running the simulation 10,000 times produced the following result which is almost similar to the previous one.

The mean is still between $21M and $22M, and the commission payment ranges between $19M and $24M.

These results no doubt give the managers of the retail outlet an estimate of how much should be budgeted for commission payment for the following year.

Conclusion

We solved a real-world problem using Monte Carlo simulation – a tool used to gain a better understanding of distributions of probable outcomes. By calculating a formula multiple times using different random inputs, management can be able to make informed decisions.

This demonstration gives you an idea of how you can apply Monte Carlo simulation to solve more advanced problems.