Problem Formulation and Solution Overview
Let’s say you have written several code sections. Each section performs identical tasks. How could you best determine which section works faster and more efficiently?
Preparation
Before moving forward, please ensure the following libraries are installed. If not installed, click the links below to perform this task.
After ensuring the above libraries are installed, add the code below to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import numpy as np import time
Method 1: Use np.random.randint()
This example uses the np.random.randint()
function from the NumPy library to generate random integers. The execution time of this script is recorded and displayed.
start_time = time.time() df = pd.DataFrame(np.random.randint(1, 800, size=(1000000, 1)), columns=['random_nums']) df['random_nums'] = df['random_nums'].astype(float) print(df['random_nums']) print(f'Execution time was {time.time() - start_time} seconds.')
The first line in the above code calls the time.time()
function from the time
library and saves it to start_time
. The execution time starts counting from this point forward.
The following line, creates a new DataFrame by passing it two (2) arguments:
- The first argument is
np.random.randint()
, used to generate random integers. This function is passed three (3) arguments:
– low: this identifies the lowest integer to be drawn (1)
– high: this identifies the highest integer to be drawn (800)
– size: returns a random integer from the specified size (1000000, 1) - The second argument is a column name to hold the randomly generated integers (
columns=['random_nums']
).
The results are assigned to a new DataFrame variable df
.
To extend the execution time of this script, the contents of the DataFrame df['random_nums']
is converted from a numpy.int32
data type to a numpy.float64
. The results save back to df['random_nums']
, and output to the terminal.
Below are the top five (5) rows of the DataFrame.
random_nums | |
0 | 170.0 |
1 | 784.0 |
2 | 730.0 |
3 | 527.0 |
4 | 183.0 |
The last line calculates the script’s execution time and outputs the same to the terminal.
Execution time was 0.011309146881103516 seconds. |
However, if we changed the data type conversion from a numpy.int32 to a string and ran this code, the execution time would be higher than above.
start_time = time.time() df = pd.DataFrame(np.random.randint(1, 800, size=(1000000, 1)), columns=['random_nums']) df['random_nums'] = df['random_nums'].astype(str) print(df['random_nums']) print(f'Execution time was {time.time() - start_time} seconds.')
Execution time was 0.21493077278137207 seconds. |
This outcome points out how important it is to keep an eye on what code you use. There may be a more efficient way to attain the same results. The goal is to keep your code clean, efficient and fast!
Method 2: Use np.random.default_rng()
This example uses the np.random.default_rng()
function from the NumPy library to generate random integers. The execution time of this script is recorded and displayed.
start_time = time.time() num_range = np.random.default_rng() df = pd.DataFrame(num_range.integers(1, 800, size=(1000000, 1)), columns=['random_nums']) df['random_nums'] = df['random_nums'].astype(float) print(df['random_nums']) print(f'Execution time was {time.time() - start_time} seconds.')
The first line in the above code calls the time.time()
function from the time
library and saves it to start_time
. The execution time starts counting from this point forward.
The following line calls the np.random.default_rng()
function, which is used to generate random numbers. In this case, integers. This function returns a Generator object and is saved to num_range
.
The next line creates a new DataFrame by passing it two (2) arguments:
- The first argument is
num_range.integers()
, used to generate random integers. This function is then passed three (3) arguments:
– low: this identifies the lowest integer to be drawn (1)
– high: this identifies the highest integer to be drawn (800)
– size: returns a random integer from the specified size (1000000, 1) - The second argument is a column name to hold the randomly generated integers (
columns=['random_nums']
).
The results to a new DataFrame, df
.
To extend the execution time of this script, the contents of the DataFrame df['random_nums']
is converted from a numpy.int32
data type to a numpy.float64
. The results save back to df['random_nums']
, and output to the terminal.
Below are the top five (5) rows of the DataFrame.
random_nums | |
0 | 728.0 |
1 | 371.0 |
2 | 455.0 |
3 | 121.0 |
4 | 509.0 |
The last line calculates the script’s execution time and outputs the same to the terminal.
Execution time was 0.009771108627319336 seconds. |
As in our test in Method 1, the data type was converted from a numpy.int32
, and as above, the execution time longer to execute.
start_time = time.time() num_range = np.random.default_rng() df = pd.DataFrame(num_range.integers(1, 800, size=(1000000, 1)), columns=['random_nums']) df['random_nums'] = df['random_nums'].astype(str) print(df['random_nums']) print(f'Execution time was {time.time() - start_time} seconds.')
Execution time was 0.2911059856414795 seconds. |
Method 3: Use datetime.datetime.now()
This example uses the datetime.datetime.now()
function from the datetime
library to determine the execution time of a script.
For this example, the NumPy and time
libraries are not required. To follow along, download the crimes.csv
file and move it to the current working directory.
import datetime start_time = datetime.datetime.now() df = pd.read_csv('crimes.csv', usecols=['address', 'crimedescr', 'ucr_ncic_code']) df['address'] = [x.title() for x in df['address']] df['crimedescr'] = [x.title() for x in df['crimedescr']] print(df) print(f'Execution time was {datetime.datetime.now() - start_time} seconds.')
The first line of the above code imports the
library. This allows us to call the datetime
now()
function to return the current date and time.
The next line calls the datetime.datetime.now()
function and saves it to start_time
. The execution time starts counting from this point forward.
To extend the execution time of this script, a CSV file is read in, and two (2) DataFrame columns are converted from upper case to title case. The contents are then output to the terminal.
Below are the first three (3) rows of the DataFrame.
address | crimedescr | ucr_ncic_code | |
0 | 3108 Occidental Dr | 10851(A)Vc Take Veh W/O Owner | 2404 |
1 | 2082 Expedition Way | 459 Pc Burglary Residence | 2204 |
2 | 4 Palen Ct | 10851(A)Vc Take Veh W/O Owner | 2404 |
The last line calculates the script’s execution time and outputs the same to the terminal.
Execution time was 0:00:00.028006 seconds. |
Method 4: Use timeit.timeit()
This example uses the timeit.timeit()
function from the timeit
library to run a defined function a set number of times. The execution time of this script is recorded and displayed.
import timeit m_lib = 'from math import factorial' m_code = ''' def m_func(): m_list = [] for x in range(600): m_list.append(factorial(x)) ''' print(timeit.timeit(setup=m_lib, stmt=m_code, number=19000))
The first line imports the timeit
library.
The following line, uses a string to import the factorial
function from the math
library. The results save to m_lib
.
The next line uses a string to create a function that loops through a range of numbers (for x in range(600)
), takes each number, determines the factorial
and appends the result to a List
.
The last line outputs the execution time by passing the print()
function three (3) arguments:
m_lib
: the string version of importing the math library.m_code
: the string version of the function to be executed.number
: the number times to run this.
0.0007745000766590238 |
Summary
This article has provided four (4) ways to determine a script’s execution time to select the best fit for your coding requirements.
Good Luck & Happy Coding!