Problem Formulation and Solution Overview
Wine-It, a subscription-based crate company, ships a different bottle of wine to subscribers each month. They have 50,000+ users and need a quick way to total the monthly fee plus taxes to invoice the customer. They need your expertise!
π¬ Question: How would we write Python code to sum two (2) DataFrame columns?
We can accomplish this task by one of the following options:
- Method 1: Use the Addition Operator
- Method 2: Use a Lambda
- Method 3: Use DataFrame
loc
andadd
- Method 4: Use DataFrame
iloc
andsum
Preparation
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Method 1: Use Addition Operator
This simplistic approach uses the addition operator (+
) to sum the two (2) DataFrame columns.
df = pd.read_csv('subscribers.csv') total = df['fees'] + df['taxes'] df = df.assign(total=total) print(df)
This code reads in the appropriate CSV file, then sums up the fees
and taxes
columns. The result saves to total
.
Next, a new column is created using assign()
and appended to the DataFrame.
Finally, the output is sent to the terminal.
Output (snippet)
id | fees | taxes | total | |
0 | 30022145 | 11.98 | 7.12 | 19.10 |
1 | 30022192 | 9.98 | 5.98 | 15.96 |
2 | 30022331 | 10.98 | 6.58 | 17.56 |
Pandas achieves this by overriding its magic methods such as __add__
. You can find our guide on the addition operator here:
Method 2: Use a Lambda
What takes other methods 2-3 lines to complete, the Lambda accomplishes in one! For example, the expression df.apply(lambda x: x['A'] + x['B'], axis=1)
sums two DataFrame columns 'A'
and 'B'
.
Here’s a code example using our case study scenario outlined above:
df = pd.read_csv('subscribers.csv') df['total'] = df.apply(lambda x: x['fees'] + x['taxes'], axis=1) print(df)
This code reads in the appropriate CSV file, then uses a Lambda to sum the two columns and append a new column containing the totals to the DataFrame. Finally, the output is sent to the terminal.
βA Finxter Favorite!
Output (snippet)
id | fees | taxes | total | |
0 | 30022145 | 11.98 | 7.12 | 19.10 |
1 | 30022192 | 9.98 | 5.98 | 15.96 |
2 | 30022331 | 10.98 | 6.58 | 17.56 |
Method 3: Use DataFrame loc
You can also sum two columns by using Pandas loc
to reference the Column Names and the add()
method to add those columns together. You can then assign the newly created summed column to an existing or a new DataFrame column.
Here’s an example:
df = pd.read_csv('subscribers.csv') df.loc[:,'total'] = df.loc[:,'fees'].add(df.loc[:,'taxes']) print(df)
This code reads in the appropriate CSV file, then uses the loc method to reference the DataFrame columns by name. The add method saves the results to a new column, total
. The output is sent to the terminal.
Output (snippet)
id | fees | taxes | total | |
0 | 30022145 | 11.98 | 7.12 | 19.10 |
1 | 30022192 | 9.98 | 5.98 | 15.96 |
2 | 30022331 | 10.98 | 6.58 | 17.56 |
Method 4: Use DataFrame iloc
Another way to add two columns is using Pandas iloc
to reference the Column Numbers and the sum()
method.
df = pd.read_csv('subscribers.csv') df['total'] = df.iloc[:,1:2].sum(axis=1) print(df)
This code reads in the appropriate CSV file, then uses the iloc
method to reference the DataFrame columns by number. The sum
method saves the results to a new column, total
. The output is sent to the terminal.
Output (snippet)
id | fees | taxes | total | |
0 | 30022145 | 11.98 | 7.12 | 19.10 |
1 | 30022192 | 9.98 | 5.98 | 15.96 |
2 | 30022331 | 10.98 | 6.58 | 17.56 |
Summary
These four (4) methods to sum up two (2) DataFrame columns should give you enough information to select the best one for your coding requirements.
Good Luck & Happy Coding!