Problem Formulation and Solution Overview
As a Python Coder, situations arise where you will need to apply a function against elements of a DataFrame Column.
You have a DataFrame containing user information (including the column Recurring
). This column is the Monthly Fee for a subscription: based on Access Levels.
The new fee for the Basic Access Level changes from $9.98/month to $11.98/month.
π¬ Question: How would we update only these DataFrame Column entries?
We can accomplish this task by one of the following options:
- Method 1: Use
apply()
and alambda
- Method 2: Use
map()
and alambda
- Method 3: Use
replace()
- Method 4: Use Pandas
loc
attribute - Method 5: Use a Custom Function
π‘ Note: To follow along, click here to download the CSV. Then, move this file to the current working directory.
Preparation
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Method 1: Use Apply and a Lambda
You can apply a function to each element of an array by using apply()
and passing an anonymous lambda
function. This function is then executed on each array element.
df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring']) df['Recurring'] = df['Recurring'].apply(lambda x: x+2.00 if x == 9.98 else x) print(df)
The results save back to the DataFrame Column df['Recurring']
, and the output is shown below.
π‘ Note: The apply()
function used in conjunction with a lambda
works well. However, performance may be affected if there are many DataFrame Column entries to adjust.
Original DataFrame (top 5 records)
0 | 30022145 | 1915.0 | 11.98 |
1 | 30022192 | 1001.0 | 11.98 |
2 | 30022331 | 15.0 | 9.98 |
3 | 30022345 | 1415.0 | 10.98 |
4 | 30022359 | 1950.0 | 15.98 |
5 | 30022361 | NaN | 11.98 |
Output Updated DataFrame (top 5 records)
0 | 30022145 | 1915.0 | 11.98 |
1 | 30022192 | 1001.0 | 11.98 |
2 | 30022331 | 15.0 | 11.98 |
3 | 30022345 | 1415.0 | 10.98 |
4 | 30022359 | 1950.0 | 15.98 |
5 | 30022361 | NaN | 11.98 |
Method 2: Using Map and a Lambda
You can apply a function to each element of an array by using a map()
function in which you pass an anonymous lambda
function that executes on each array element.
df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring']) df['Recurring'] = df['Recurring'].map(lambda x: x+2.00 if x == 9.98 else x) print(df)
The map() function is faster than apply() and a better solution in some instances.
Method 3: Use Replace
This method uses Python’s replace()
, which, for this example, is passed two (2) parameters: (old
, new
). For clarity, we enclosed these values inside a List.
df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring']) df['Recurring'] = df['Recurring'].replace([9.98], [11.98]) print(df)
This example replace(), executes, and updates the Recurring column based on the previous and new parameters.
π‘ Note: The output snippet is the same as shown above.
Method 4: Use Pandas Loc
This method uses the Pandas loc
attribute, allowing access to entries in a DataFrame Column using the index and/or Column label.
df = pd.read_csv('finxters.csv', usecols=['FID', 'Solved', 'Recurring']) the_filter = df.Recurring == 9.98 df.loc[the_filter, 'Recurring'] = 11.98 print(df)
In this example, a condition is created and assigned to the_filter
.
The condition is then applied and updates the DataFrame Column entries based on said condition.
π‘ Note: The output snippet is the same as shown above.
Method 5: Use a Custom Function
For more complex computations, a custom function is an ideal solution! Each iteration fee_change()
is called and applied to the DataFrame Column entry if the condition below is met.
def fee_change(x): return x+2.00 df['Recurring'] = df['Recurring'].apply(lambda x: fee_change(x) if x == 9.98 else x) print(df)
π‘ Note: The output snippet is the same as shown above.
Summary
As you can see, there are a few ways to accomplish the same task. It is up to you to decide which method best meets your coding requirements.
Good Luck & Happy Coding!