How to Apply a Function to Each Cell in a Pandas DataFrame?

5/5 - (3 votes)

Problem Formulation

Given the following DataFrame df:

import pandas as pd


df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4},
                   {'A':4, 'B':8, 'C':3, 'D':1},
                   {'A':2, 'B':7, 'C':1, 'D':2},
                   {'A':3, 'B':5, 'C':1, 'D':2}])

print(df)
'''
   A  B  C  D
0  1  2  2  4
1  4  8  3  1
2  2  7  1  2
3  3  5  1  2
'''

πŸ’¬ Challenge: How to apply a function f to each cell in the DataFrame?

For example, you may want to apply a function that replaces all odd values with the value 'odd'.

import pandas as pd


df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4},
                   {'A':4, 'B':8, 'C':3, 'D':1},
                   {'A':2, 'B':7, 'C':1, 'D':2},
                   {'A':3, 'B':5, 'C':1, 'D':2}])


def f(cell):
    if cell%2 == 1:
        return 'odd'
    return cell


# ... <Apply Function f to each cell> ...

print(df)
'''
     A    B    C    D
0  odd    2    2    4
1    4    8  odd  odd
2    2  odd  odd    2
3  odd  odd  odd    2
'''

Solution: DataFrame applymap()

The Pandas DataFrame df.applymap() method returns a new DataFrame where the function f is applied to each cell of the original DataFrame df. You can pass any function object as a single argument into the df.applymap() function, either defined as a lambda expression or a normal function.

Example 1: Replace Odd Values in DataFrame

Here’s an example where each cell of the DataFrame is checked against whether it is an odd value. If so, it is replaced with the string 'odd':

def f(cell):
    if cell%2 == 1:
        return 'odd'
    return cell


df_new = df.applymap(f)

print(df_new)
'''
     A    B    C    D
0  odd    2    2    4
1    4    8  odd  odd
2    2  odd  odd    2
3  odd  odd  odd    2
'''

Example 2: Create Two DataFrames with Even and Odd Values Replaced

A slightly advanced example uses two lambda functions to create two new DataFrames where one has all odd and the other has all even values replaced:

import pandas as pd


df = pd.DataFrame([{'A':1, 'B':2, 'C':2, 'D':4},
                   {'A':4, 'B':8, 'C':3, 'D':1},
                   {'A':2, 'B':7, 'C':1, 'D':2},
                   {'A':3, 'B':5, 'C':1, 'D':2}])


df_even = df.applymap(lambda x: 'odd' if x%2 else x)
df_odd = df.applymap(lambda x: x if x%2 else 'even')

print(df_even)
'''
     A    B    C    D
0  odd    2    2    4
1    4    8  odd  odd
2    2  odd  odd    2
3  odd  odd  odd    2
'''

print(df_odd)
'''
      A     B     C     D
0     1  even  even  even
1  even  even     3     1
2  even     7     1  even
3     3     5     1  even
'''

We used the concept of a ternary operator to concisely define the replacement function using the keyword lambda to create a function object “on the fly”.

🌍 Recommended Tutorial: Understanding the Ternary Operator in Python

What to Do for Huge Data Sets?

To apply a function to each cell of a DataFrame if the data set has millions of rows, a better and more memory-efficient way would be to use a CSV file and change its values (update and replace data) line by line. Read one line, update its values and write the updated values in a new file (or modify an existing one).

🌍 Recommended Tutorials:


This tutorial idea was proposed by Finxter student Kyriakos. ❀️