How to Apply a Function to Column Elements

Rate this post

Problem Formulation and Solution Overview

As a Python Coder, situations arise where you will need to apply a function against elements of a DataFrame Column.

To make it more fun, we have the following running scenario:

You have a DataFrame containing user information (including the column Recurring). This column is the Monthly Fee for a subscription: based on Access Levels.

The new fee for the Basic Access Level changes from $9.98/month to $11.98/month.

💬 Question: How would we update only these DataFrame Column entries?

We can accomplish this task by one of the following options:

💡 Note: To follow along, click here to download the CSV. Then, move this file to the current working directory.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.


$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd 

Method 1: Use Apply and a Lambda

You can apply a function to each element of an array by using apply() and passing an anonymous lambda function. This function is then executed on each array element.

df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring'])
df['Recurring'] = df['Recurring'].apply(lambda x: x+2.00 if x == 9.98 else x)
print(df)

The results save back to the DataFrame Column df['Recurring'], and the output is shown below.

💡 Note: The apply() function used in conjunction with a lambda works well. However, performance may be affected if there are many DataFrame Column entries to adjust.

Original DataFrame (top 5 records)

0 30022145 1915.0 11.98
1300221921001.0 11.98
230022331 15.0 9.98
330022345 1415.0 10.98
430022359 1950.0 15.98
530022361 NaN 11.98

Output Updated DataFrame (top 5 records)

0 30022145 1915.0 11.98
1300221921001.0 11.98
230022331 15.0 11.98
330022345 1415.0 10.98
430022359 1950.0 15.98
530022361 NaN 11.98

Method 2: Using Map and a Lambda

You can apply a function to each element of an array by using a map() function in which you pass an anonymous lambda function that executes on each array element.

df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring'])
df['Recurring'] = df['Recurring'].map(lambda x: x+2.00 if x == 9.98 else x)
print(df)

The map() function is faster than apply() and a better solution in some instances.


Method 3: Use Replace

This method uses Python’s replace(), which, for this example, is passed two (2) parameters: (old, new). For clarity, we enclosed these values inside a List.

df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring'])
df['Recurring'] = df['Recurring'].replace([9.98], [11.98])
print(df)

This example replace(), executes, and updates the Recurring column based on the previous and new parameters.

💡 Note: The output snippet is the same as shown above.


Method 4: Use Pandas Loc

This method uses the Pandas loc attribute, allowing access to entries in a DataFrame Column using the index and/or Column label.

df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring'])
the_filter = df.Recurring == 9.98
df.loc[the_filter, 'Recurring'] = 11.98
print(df)

In this example, a condition is created and assigned to the_filter.
The condition is then applied and updates the DataFrame Column entries based on said condition.

💡 Note: The output snippet is the same as shown above.


Method 5: Use a Custom Function

For more complex computations, a custom function is an ideal solution! Each iteration fee_change() is called and applied to the DataFrame Column entry if the condition below is met.

def fee_change(x):
    return x+2.00
    
df['Recurring'] = df['Recurring'].apply(lambda x: fee_change(x) if x == 9.98 else x)
print(df)

💡 Note: The output snippet is the same as shown above.


Summary

As you can see, there are a few ways to accomplish the same task. It is up to you to decide which method best meets your coding requirements.

Good Luck & Happy Coding!