Python Find Longest String in a DataFrame Column

πŸ’¬ Challenge: Given a Pandas DataFrame. How to find the longest string in a given DataFrame column?

import pandas as pd

df = pd.DataFrame(['a', 'aaa', 'aaaaa'], columns=['A'])
print(df)

       A
0      a
1    aaa
2  aaaaa  # <-- This is what we want!

We’re going to discuss different variants of this problem next. Let’s get started with the easiest next!

Method 1: Find Length of Longest String in DataFrame Column

To find the length of the longest string in a DataFrame column, use the expression df.COL.str.len().max() replacing COL with your custom column name.

import pandas as pd

df = pd.DataFrame(['a', 'aaa', 'aaaaa'], columns=['A'])
print(df.A.str.len().max())
# 5

This is how the expression df.COL.str.len().max() works step by step:

  • df.COL accesses the column COL of your DataFrame df.
  • df.COL.str provides you with different string methods to apply to this column.
  • df.COL.str.len() converts the column strings to integer length values where each string is converted to its length.
  • df.COL.str.len().max() gets the maximum column value, i.e., the length of the longest string.

Method 2: Find Index of Longest String in DataFrame Column

To find the index of the longest string in a DataFrame column, use the expression df.COL.str.len().idxmax() replacing COL with your custom column name.

import pandas as pd

df = pd.DataFrame(['a', 'aaa', 'aaaaa'], columns=['A'])
print(df.A.str.len().idxmax())
# 2

This is how the expression df.COL.str.len().max() works step by step:

  • df.COL accesses the column COL of your DataFrame df.
  • df.COL.str provides you with different string methods to apply to this column.
  • df.COL.str.len() converts the column strings to integer length values where each string is converted to its length.
  • df.COL.str.len().idxmax() gets the index of the maximum column value, i.e., the index of the longest string in the column.

Method 3: Get Longest String in DataFrame Column

To get the longest string in a DataFrame column, first get the index of that string in the column using df.COL.str.len().idxmax() replacing COL with your custom column name. Then use normal index such as df['COL'][idx] to access the value at index idx in column 'COL'.

import pandas as pd

df = pd.DataFrame(['a', 'aaa', 'aaaaa'], columns=['A'])

# 1. Get index of longest string in column
idx = df.A.str.len().idxmax()
# Index: 2

# 2. Get longest string using df['A'][idx]
print('Longest string in column:', df['A'][idx])
# Longest string in column: aaaaa

This is how the expression df.COL.str.len().max() works step by step:

  • df.COL accesses the column COL of your DataFrame df.
  • df.COL.str provides you with different string methods to apply to this column.
  • df.COL.str.len() converts the column strings to integer length values where each string is converted to its length.
  • df.COL.str.len().idxmax() gets the index of the maximum column value, i.e., the index of the longest string in the column.
  • df['A'][idx] gets the DataFrame column value of column 'A' and index idx.

Thanks for reading through the whole article! If you want to learn more, check out my 5-min Pandas Tutorial here and in the following video:

Also, check out our Python tutorials and free cheat sheets in our email academy:


Programming Humor

πŸ’‘ Programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science.

~~~

  • Question: Why do Java programmers wear glasses?
  • Answer: Because they cannot C# …!

Feel free to check out our blog article with more coding jokes. πŸ˜‰