5 Best Ways to Print the Length of Elements in All Columns of a DataFrame Using applymap in Python

πŸ’‘ Problem Formulation: Often when dealing with text data in pandas DataFrames, it’s necessary to know the length of each element within columns to perform certain operations or data pre-processing steps. For example, one might need to pad strings or truncate them to a fixed length. Given a DataFrame, we’d like to apply a function to each element to get the length of the string it contains, creating a new DataFrame with the same shape showing the length of each entry.

Method 1: Basic applymap Usage

Applying the applymap() function in pandas uses a specified function to transform each element of the DataFrame. applymap() works element-wise on the entire DataFrame, making it an ideal method to compute the string length of each data point. The following example demonstrates this method with a simple DataFrame comprising strings.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': ['apple', 'orange'], 'B': ['banana', 'cherry']})

# Apply the len function to each element in the DataFrame
length_df = df.applymap(len)

print(length_df)

Output:

   A  B
0  5  6
1  6  6

This code snippet creates a DataFrame with two columns A and B with fruit names as strings. The applymap(len) function applies Python’s built-in len() function to each element of the DataFrame, creating a new DataFrame where each value is the length of the corresponding string in the original DataFrame.

Method 2: Using a Custom Function

While applymap() can be used with built-in functions, such as len(), it can also apply custom functions to DataFrame elements. Writing a custom function, even if it wraps a built-in function, can add clarity or handle more complex scenarios.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': ['red', 'blue', 'green'], 'B': ['yellow', 'purple', 'orange']})

# Define a custom function to return the length of a string
def length_of_string(string):
    return len(string)

# Apply the custom function to each element in the DataFrame
length_df = df.applymap(length_of_string)

print(length_df)

Output:

   A  B
0  3  6
1  4  6
2  5  6

This code defines a custom function length_of_string() that returns the length of its input string. The custom function is then used within applymap() to transform each element of the DataFrame into its length.

Method 3: applymap with Lambda Functions

A lambda function is a small anonymous function that can be used as a one-time function inside other functions such as applymap(). In this method, we utilize a lambda function to find the length of each element directly within the applymap() call, making the code concise.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': ['cat', 'dog'], 'B': ['parrot', 'crow']})

# Use a lambda function to apply the len function to each element
length_df = df.applymap(lambda x: len(x))

print(length_df)

Output:

   A  B
0  3  6
1  3  4

The lambda function in this example, lambda x: len(x), takes each element x and returns its length. This anonymous function is passed directly to applymap, which applies it to each element and produces a DataFrame of lengths.

Method 4: Incorporating Conditions

Conditions can be added to the custom function or lambda function inside applymap() to handle special cases. For example, you may only want to calculate the length of elements that satisfy a certain condition.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': ['yes', 'no', 'maybe'], 'B': ['true', 'false', 'uncertain']})

# Use applymap with a lambda that checks for string length greater than 2
length_df = df.applymap(lambda x: len(x) if len(x) > 2 else 'Too short')

print(length_df)

Output:

         A          B
0        3          4
1  Too short  Too short
2        5          9

In this code, the lambda function contains a condition: it only returns the length of the string if the length is greater than 2; otherwise, it returns ‘Too short’. This allows for custom behavior based on conditions directly within the applymap() call.

Bonus One-Liner Method 5: Using applymap with str.len()

Pandas natively supports string functions which can be applied to Series with string data. By using the str accessor, we can call str.len() to get the length of strings in a Series. This can be applied to each column in a DataFrame using applymap().

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': ['simple', 'example'], 'B': ['applymap', 'length']})

# Apply the str.len() function to each element in the DataFrame using applymap
length_df = df.applymap(lambda x: x.__len__())

print(length_df)

Output:

   A  B
0  6  8
1  7  6

This technique uses the built-in __len__ method of string objects, which is equivalent to calling len(x) on them. Here, the lambda function calls x.__len__() for each element x to get its length. This method is equivalent to the previous methods but makes explicit use of the string’s own __len__() method.

Summary/Discussion

  • Method 1: applymap with len function. Strengths: Simple and straightforward. Weaknesses: Doesn’t allow customization.
  • Method 2: Custom function. Strengths: More readable and can be extended for more complex cases. Weaknesses: Slightly more verbose.
  • Method 3: Lambda function. Strengths: Compact and inline, no need to define an external function. Weaknesses: Can be less readable for complex functions.
  • Method 4: Conditions within applymap. Strengths: Can handle complex logic within data transformation. Weaknesses: Can make code more complex and harder to read.
  • Method 5: One-liner lambda with str.len(). Strengths: Demonstrates direct usage of object methods in applymap calls. Weaknesses: Slightly obscure due to the direct use of dunder methods.