How to Sum two DataFrame Columns

Problem Formulation and Solution Overview

In this article, you’ll learn how to sum two (2) DataFrame columns in Python.

To make it more fun, we have the following running scenario:

Wine-It, a subscription-based crate company, ships a different bottle of wine to subscribers each month. They have 50,000+ users and need a quick way to total the monthly fee plus taxes to invoice the customer. They need your expertise!

πŸ’¬ Question: How would we write Python code to sum two (2) DataFrame columns?

We can accomplish this task by one of the following options:


Preparation

Before our code executes successfully, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd 

Method 1: Use Addition Operator

This simplistic approach uses the addition operator (+) to sum the two (2) DataFrame columns.

df = pd.read_csv('subscribers.csv')
total = df['fees'] + df['taxes']  
df = df.assign(total=total)
print(df)

This code reads in the appropriate CSV file, then sums up the fees and taxes columns. The result saves to total.

Next, a new column is created using assign() and appended to the DataFrame.

Finally, the output is sent to the terminal.

Output (snippet)

idfeestaxestotal
03002214511.987.1219.10
1300221929.985.9815.96
23002233110.986.5817.56

Pandas achieves this by overriding its magic methods such as __add__. You can find our guide on the addition operator here:


Method 2: Use a Lambda

What takes other methods 2-3 lines to complete, the Lambda accomplishes in one! For example, the expression df.apply(lambda x: x['A'] + x['B'], axis=1) sums two DataFrame columns 'A' and 'B'.

Here’s a code example using our case study scenario outlined above:

df = pd.read_csv('subscribers.csv')
df['total'] = df.apply(lambda x: x['fees'] + x['taxes'], axis=1)
print(df)

This code reads in the appropriate CSV file, then uses a Lambda to sum the two columns and append a new column containing the totals to the DataFrame. Finally, the output is sent to the terminal.

⭐A Finxter Favorite!

Output (snippet)

idfeestaxestotal
03002214511.987.1219.10
1300221929.985.9815.96
23002233110.986.5817.56

Method 3: Use DataFrame loc

You can also sum two columns by using Pandas loc to reference the Column Names and the add() method to add those columns together. You can then assign the newly created summed column to an existing or a new DataFrame column.

Here’s an example:

df = pd.read_csv('subscribers.csv')
df.loc[:,'total'] = df.loc[:,'fees'].add(df.loc[:,'taxes'])
print(df)

This code reads in the appropriate CSV file, then uses the loc method to reference the DataFrame columns by name. The add method saves the results to a new column, total. The output is sent to the terminal.

Output (snippet)

idfeestaxestotal
03002214511.987.1219.10
1300221929.985.9815.96
23002233110.986.5817.56

Method 4: Use DataFrame iloc

Another way to add two columns is using Pandas iloc to reference the Column Numbers and the sum() method.

df = pd.read_csv('subscribers.csv')
df['total'] = df.iloc[:,1:2].sum(axis=1)
print(df)

This code reads in the appropriate CSV file, then uses the iloc method to reference the DataFrame columns by number. The sum method saves the results to a new column, total. The output is sent to the terminal.

Output (snippet)

idfeestaxestotal
03002214511.987.1219.10
1300221929.985.9815.96
23002233110.986.5817.56

Summary

These four (4) methods to sum up two (2) DataFrame columns should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!