In this project, you are taking the role of a data analyst. Your client, a large retail outlet has been running an affiliate marketing program in an effort to increase its sales. You are tasked to come up with a predictive model to predict how much should be budgeted for sales commission for the following year.
This is one of the projects you will likely come across as a data analyst. How do you go about solving this problem? No doubt, you will ask your client for more information. For example, data showing previous years’ commission payments, the number of sales representatives, and the company’s defined formula for calculating commissions.
For this project, we assumed that the only information at our disposal is a rate schedule showing the company’s formula for calculating commissions.
Rates Schedule
Sales ($) | Rates |
1,000 – 100,000 | 3% |
101,000 – 500,000 | 5% |
501,000 – 1,000,000 | 10% |
The table shows that the more sales, the higher the commission rate. Since we were neither given sales historical data nor the number of sales representatives, we are going to make a few assumptions.
Basic Assumptions
We will generate random sales data based on the following assumptions which are also based on the information from the rates schedule:
- Sales made cannot be below 1000 dollars.
- Sales made are in thousands.
- The company employed 500 persons for the affiliate marketing program.
- Sales made by each individual cannot be more than 1 million dollars in a given year.
With these assumptions, we are ready to generate random data.
Python’s random module is used to generate random numbers. We will use the sample()
method which takes in a range of numbers and generate random data based on the length defined.
The parameters for the range()
function are given as: range(start, stop, step)
where:
start
: The starting point. In our case, it is 1000 as we assumed that the sales amount cannot be below 1000 dollars.stop
: The ending point. In our case, it is 1 million. We assumed that it won’t go beyond that.step
: The number to skip to ensure that the data end with three zeros as given in the rates schedule.
The sample()
method takes another parameter that indicates the length of the random numbers. Remember that each randomly generated number represents the sales made by each individual. So, the length will be 500 to indicate the number of workforces employed by the company.
import random sales = random.sample(range(1000, 1000000, 1000), 500)
Let’s load the data on a Pandas DataFrame.
import pandas as pd data = pd.DataFrame({'Sales': sales}) data 0 986000 1 451000 2 487000 3 982000 4 194000 .. ... 495 761000 496 615000 497 786000 498 952000 499 82000 [500 rows x 1 columns] data.Sales.min() 2000 data.Sales.max() 986000
Now that we have our sales data. The next thing we have to do is to calculate the commission. The commission amount on the other hand is based on the rates. So, we need a way to assign rates for each sales representative based on sales made. A defined function will accomplish that.
def commissionRates(sales): if sales <= 100000: return .03 elif sales <= 500000: return .05 else: return .1
With this function defined, we can easily create the rates column using Pandas’ apply
function.
data['Rates'] = data.Sales.apply(commissionRates) data.head(10) Sales Rates 0 986000 0.10 1 451000 0.05 2 487000 0.05 3 982000 0.10 4 194000 0.05 5 305000 0.05 6 154000 0.05 7 768000 0.10 8 685000 0.10 9 885000 0.10
The rates are applied according to the rates schedule given to us. To get the commission amount, it’s a matter of multiplying the rates by the sales amount.
data['Commission'] = data.Sales * data.Rates data.tail(10) Sales Rates Commission 490 149000 0.05 7450.0 491 793000 0.10 79300.0 492 205000 0.05 10250.0 493 894000 0.10 89400.0 494 276000 0.05 13800.0 495 761000 0.10 76100.0 496 615000 0.10 61500.0 497 786000 0.10 78600.0 498 952000 0.10 95200.0 499 82000 0.03 2460.0
We now have the total commission amount of $ 21,831,140 to be paid to sales representatives.
Monte Carlo Simulation
Monte Carlo analysis is a useful tool for making predictions. It takes as an input, different random numbers and runs them several times to generate many probable outcomes. How do we apply it in our case?
We want to predict the commission amount to be budgeted for the coming year. The amount we realized was based on one single attempt or iteration. We can’t just conclude that this is the ideal amount until we run it several times up to 10,000 or more using different randomly generated data.
def simulate_commission(iterations): results = [] for i in range(iterations): sales = random.sample(range(1000, 1000000, 1000), 500) data = pd.DataFrame({'Sales': sales}) data['Rates'] = data.Sales.apply(commissionRates) data['Commission'] = data.Sales * data.Rates results.append( [ data.Sales.sum().round(), data.Commission.sum().round(), ] ) return results
We create a function that will perform the simulation based on the number of times chosen. We can try 1,000, 10,000 or even 50,000 times. In the end, we sum the sales and the commissions to be used for further analysis.
Let’s give it a try with 5000 simulations and load it on a Pandas DataFrame.
results = simulate_commission(5000) results = pd.DataFrame.from_records(results, columns=[‘Sales’, ‘Commission’]) results Sales Commission 0 256137000 22305150.0 1 252466000 21865310.0 2 246516000 21478230.0 3 250821000 21669860.0 4 252750000 22314280.0 ... ... ... 4995 253876000 22073200.0 4996 245326000 21089810.0 4997 248717000 21651090.0 4998 246074000 21105970.0 4999 251908000 21810780.0 results.describe().round() Sales Commission count 5000.0 5000.0 mean 250092538.0 21826489.0 std 4526122.0 523127.0 min 229959000.0 19379090.0 25% 247034250.0 21476432.0 50% 250058000.0 21816840.0 75% 253203250.0 22188120.0 max 266212000.0 23649430.0
After running 5000 simulations, the results show that the commission amount is between $19M and $23M with a mean of $21M. This is within our previous results. Let’s visualize this on a histogram.
def currency(x, pos): if x >= 1e6: s = '${:1.1f}M'.format(x*1e-6) else: s = '${:1.0f}K'.format(x*1e-3) return s
Here, we are setting the function that will format the commission amount as it’s too large to be fitted on the chart. It will take the amount and the tick position as parameters. We will feed it into the xaxis.set_major_formatter()
method to format the string.
import matplotlib.pyplot fig, ax = plt.subplots() ax.xaxis.set_major_formatter(currency) results.Commission.plot(kind='hist', title='Commission Amount') plt.grid() plt.show()
Running the simulation 10,000 times produced the following result which is almost similar to the previous one.
The mean is still between $21M and $22M, and the commission payment ranges between $19M and $24M.
These results no doubt give the managers of the retail outlet an estimate of how much should be budgeted for commission payment for the following year.
Conclusion
We solved a real-world problem using Monte Carlo simulation – a tool used to gain a better understanding of distributions of probable outcomes. By calculating a formula multiple times using different random inputs, management can be able to make informed decisions.
This demonstration gives you an idea of how you can apply Monte Carlo simulation to solve more advanced problems.