Problem Formulation
Given the following DataFrame df
:
import pandas as pd df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4}, {'A':4, 'B':8, 'C':3, 'D':1}, {'A':2, 'B':7, 'C':1, 'D':2}, {'A':3, 'B':5, 'C':1, 'D':2}]) print(df) ''' A B C D 0 1 2 2 4 1 4 8 3 1 2 2 7 1 2 3 3 5 1 2 '''
π¬ Challenge: How to apply a function f
to each cell in the DataFrame?
For example, you may want to apply a function that replaces all odd values with the value 'odd'
.
import pandas as pd df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4}, {'A':4, 'B':8, 'C':3, 'D':1}, {'A':2, 'B':7, 'C':1, 'D':2}, {'A':3, 'B':5, 'C':1, 'D':2}]) def f(cell): if cell%2 == 1: return 'odd' return cell # ... <Apply Function f to each cell> ... print(df) ''' A B C D 0 odd 2 2 4 1 4 8 odd odd 2 2 odd odd 2 3 odd odd odd 2 '''
Solution: DataFrame applymap()
The Pandas DataFrame df.applymap()
method returns a new DataFrame where the function f
is applied to each cell of the original DataFrame df
. You can pass any function object as a single argument into the df.applymap()
function, either defined as a lambda expression or a normal function.
Example 1: Replace Odd Values in DataFrame
Here’s an example where each cell of the DataFrame is checked against whether it is an odd value. If so, it is replaced with the string 'odd'
:
def f(cell): if cell%2 == 1: return 'odd' return cell df_new = df.applymap(f) print(df_new) ''' A B C D 0 odd 2 2 4 1 4 8 odd odd 2 2 odd odd 2 3 odd odd odd 2 '''
Example 2: Create Two DataFrames with Even and Odd Values Replaced
A slightly advanced example uses two lambda functions to create two new DataFrames where one has all odd and the other has all even values replaced:
import pandas as pd df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4}, {'A':4, 'B':8, 'C':3, 'D':1}, {'A':2, 'B':7, 'C':1, 'D':2}, {'A':3, 'B':5, 'C':1, 'D':2}]) df_even = df.applymap(lambda x: 'odd' if x%2 else x) df_odd = df.applymap(lambda x: x if x%2 else 'even') print(df_even) ''' A B C D 0 odd 2 2 4 1 4 8 odd odd 2 2 odd odd 2 3 odd odd odd 2 ''' print(df_odd) ''' A B C D 0 1 even even even 1 even even 3 1 2 even 7 1 even 3 3 5 1 even '''
We used the concept of a ternary operator to concisely define the replacement function using the keyword lambda
to create a function object “on the fly”.
π Recommended Tutorial: Understanding the Ternary Operator in Python
What to Do for Huge Data Sets?
To apply a function to each cell of a DataFrame if the data set has millions of rows, a better and more memory-efficient way would be to use a CSV file and change its values (update and replace data) line by line. Read one line, update its values and write the updated values in a new file (or modify an existing one).
π Recommended Tutorials:
This tutorial idea was proposed by Finxter student Kyriakos. β€οΈ