How to Change Strings to Lowercase in Pandas DataFrame

5/5 - (1 vote)

Problem Formulation

Problem: Given a Pandas DataFrame; how to change the strings in the DataFrame to lowercase?

Example: Consider the following Pandas DataFrame:

import pandas as pd
import numpy as np

data = {
    'col_1': ['ONE', 'TWO', 'Three', np.NAN, '100'],
}
df = pd.DataFrame(data)
print(df)

Output:

   col_1
0    ONE
1    TWO
2  Three
3    NaN
4    100

Expected Output:

   col_1
0    one
1    two
2  three
3    NaN
4    100

When you change a pandas DataFrame string column to lowercase, then the column is returned such that every string in the column is converted and displayed in lowercase while any non-alphabetical characters remain the same as seen in the above-expected output.

Let’s dive into the different approaches that will help us to convert the upper case strings in the DataFrame to lowercase.

Method 1: Using str.lower()

Approach: Call the str.lower() function upon the column to change its string values to lowercase. To select a column, use the square bracket notation and specify the column name within it, for example, df['column_name'].

Code:

import pandas as pd
import numpy as np

data = {
    'col_1': ['ONE', 'TWO', 'Three', np.NAN, '100'],
}
df = pd.DataFrame(data)
df['col_1'] = df['col_1'].str.lower()
print(df)

Output:

  col_1
0    one
1    two
2  three
3    NaN
4    100

Recap to str.lower(): Returns a lowercase version of the given string.

Method 2: Using str.casefold()

The idea here is quite similar to str.lowercase() method. The only difference in this case is we will be using the str.casefold() method instead of the lower() method.

Code:

import pandas as pd
import numpy as np

data = {
    'col_1': ['ONE', 'TWO', 'Three', np.NAN, '100'],
}
df = pd.DataFrame(data)
df['col_1'] = df['col_1'].str.casefold()
print(df)

Output:

   col_1
0    one
1    two
2  three
3    NaN
4    100

str.casefold() is a built-in method in Python that returns a string where all the characters of the returned string are lowercased.

The casefold() method is similar to the lower() method, but the casefold() method exhibits a more aggressive and stronger behavior than the lower() method. For example, the lowercase letter 'ß' in German is equivalent to ‘ss‘. The lower() method will not be able to convert this to the lowercase equivalent. However, the casefold() method will convert it to ‘ss‘.

Example:

text = 'außen'
print(text.casefold())

text = 'außen'
print(text.lower())

Output:

aussen
außen

Method 3: Using map+lambda+isinstance

Approach:

  • Check if the value represents a string value using the isinstance() method.
  • If it is a string, then convert it to lowercase using the lower() method before returning it. Otherwise, simply return the value using a lambda function.
  • Use the map function upon this lambda function to apply the operation on each value in the selected column of the dataframe.

Code:

import pandas as pd
import numpy as np

data = {
    'col_1': ['ONE', 'TWO', 'Three', np.NAN, '100'],
}
df = pd.DataFrame(data)
df['col_1'] = df['col_1'].map(lambda x: x.lower() if isinstance(x,str) else x)
print(df)

Output:

   col_1
0    one
1    two
2  three
3    NaN
4    100

📖Readers Digest

💎A lambda function is an anonymous function in Python. It starts with the keyword lambda, followed by a comma-separated list of zero or more arguments, followed by the colon and the return expression. For example, lambda x, y, z: x+y+z would calculate the sum of the three argument values x+y+z.

💎The map() function transforms one or more iterables into a new one by applying a “transformator function” to the i-th elements of each iterable. The arguments are the transformator function object and one or more iterables. If you pass n iterables as arguments, the transformator function must be an n-ary function taking n input arguments. The return value is an iterable map object of transformed, and possibly aggregated elements.

💎Python’s built-in isinstance(object, class) function takes an object and a class as input arguments. It returns True if the object is an instance of the class. Otherwise, it returns False. Instead of a class, you can also pass a tuple of classes to check if the object is an instance of any of the classes in the tuple—such as in isinstance(object, (class_A, class_B, ...)).

Bonus: Lowercase Strings in a List of Tuples in a Column

Here’s a more advanced scenario that changes the string values of a DataFrame that consists of a list of tuples inside the column.

import pandas as pd

data = {'text': [
    ('GERMANY', 'BERLIN'),
    ('INDIA','New Delhi')
]}, {'text': [
    ('Canada', 'Ottawa'),
    ('Italy', 'Rome')
]}

df = pd.DataFrame(data)

df = df['text'].apply(lambda col: [(x[0].lower(), x[1].lower()) for x in col])
print(df)

Output:

0    [(germany, berlin), (india, new delhi)]
1          [(canada, ottawa), (italy, rome)]
Name: text, dtype: object

Conclusion

Thus, in this tutorial, we learned three different ways of converting the string value in a specific column of a DataFrame to lowercase. Please subscribe and stay tuned for more interesting solutions and discussions.

Recommended Reads:


Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.