[toc]
Problem Statement: How to change the column type in pandas in Python?
Here’s a related question found on Stackoverflow:
So, our mission today is to answer this question. Hence, we are going to learn about the different ways of changing the type of columns in pandas. Let’s create a pandas dataframe that we will use throughout the tutorial to understand the solutions.
import pandas as pd df = pd.DataFrame( [ ('10', 1, 'a'), ('20', 2, 'b'), ('30', 3, 'c'), ('40', 4, 'd'), ], columns=list('abc') ) print(df) print("The type of the columns are:") print(df.dtypes)
Output
a b c 0 10 1 a 1 20 2 b 2 30 3 c 3 40 4 d The type of the columns are: a object b int64 c object dtype: object
βοΈNote: The ‘df.dtypes’ method is used to print the types of the column.
We now have our dataframe. So, without further ado let’s dive into the different methods to change the column type.
πΌMethod 1: Using to_numeric()
The best way to change one or more columns of a DataFrame to the numeric values is to use the to_numeric() method of the pandas module. It is used to convert the columns with non-numeric data types (such as strings) to numeric types (such as integers or floating-point numbers).
- If the column has numbers without decimals,
to_numeric()
will convert it toint64
- If the column has numbers with decimal points,
to_numeric()
will convert it tofloat64
.
Syntax: pd.to_numeric(df[column name])
Example: We will change the type of first column in our dataframe.
import pandas as pd df = pd.DataFrame( [ ('10', 1, 'a'), ('20', 2, 'b'), ('30', 3, 'c'), ('40', 4, 'd'), ], columns=list('abc') ) print("Before converting the type of columns are:") print(df.dtypes) # Converting column a df['a'] = pd.to_numeric(df['a']) print("\nAfter converting the type of columns is:") print(df.dtypes)
Output:
Before converting the type of columns are: a object b int64 c object dtype: object After converting the type of columns is: a int64 b int64 c object dtype: object
We can also change multiple columns into numeric type by using the apply() method as shown in the following example:
Example:
import pandas as pd df = pd.DataFrame( [ ('10', '1', 'a'), ('20', '2', 'b'), ('30', '3', 'c'), ('40', '4', 'd'), ], columns=list('abc') ) print("Before converting the type of columns are:") print(df.dtypes) # Converting column a and column b df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric) print("\nAfter converting the type of columns is:") print(df.dtypes
Output:
Before converting the type of columns are: a object b object c object dtype: object After converting the type of columns is: a int64 b int64 c object
How to handle the errors that occur during conversion?
The to_numeric()
method also takes the “errors
” argument. It forces the non-numeric values to NaN
, or it simply ignores the columns that contain these values.
- We can use the
errors = 'ignore'
to ignore the errors that occur during the conversion. By using ignore the conversion gets stopped silently without raising any errors. - We can use the
errors = 'coerce'
to convert the columns forcibly even if it has some invalid values. - We can use the
errors = 'raise'
to raise an error when it contains a value that is not available in one of the columns or it cannot be converted to a number.
πΌMethod 2: Using astype()
The astype()
method helps to change the column type explicitly to a specified dtype. The specified data type can be a built-in Python datatype, NumPy, or pandas dtype.
Example:
import pandas as pd df = pd.DataFrame( [ ('10', 1, 'a'), ('20', 2, 'b'), ('30', 3, 'c'), ('40', 4, 'd'), ], columns=list('abc') ) print("Before converting the type of columns are:") print(df.dtypes) # Converting column c df['a'] = df['a'].astype(int) print("\nAfter converting the type of columns is:") print(df.dtypes)
Output:
Before converting the type of columns are: a object b int64 c object dtype: object After converting the type of columns is: a int32 b int64 c object dtype: object
πΌMethod 3: Using convert_dtypes()
The convert_dtypes() method is used to convert the columns to the possible data types by using the dtypes supporting missing values (the dtype will be determined at runtime) The dtype is based on the value included in each of the columns.
Example:
import pandas as pd df = pd.DataFrame( [ ('10', 1, 'a'), ('20', 2, 'b'), ('30', 3, 'c'), ('40', 4, 'd'), ], columns=list('abc') ) print("Before converting the type of columns are:") print(df.dtypes) df = df.convert_dtypes() print("\nAfter converting the type of columns is:") print(df.dtypes)
Output:
Before converting the type of columns are: a object b int64 c object dtype: object After converting the type of columns is: a string b Int64 c string dtype: object
βοΈNote: This method converts the dtype implicitly. Hence if you want to convert a dtype explicitly (like object to int) you should use the other methods instead.
πΌMethod 4: Using infer_objects()
The infer_objects()
method is similar to the previous method as it is used to convert the columns that have an object data type to a specific type (soft conversions).
Example:
import pandas as pd df = pd.DataFrame({'a': [10, 20, 30, 40], 'b': ['1', '2', '3', '4'], 'c': ['a', 'b', 'c', 'd'] }, dtype='object' ) print("Before converting the type of columns are:") print(df.dtypes) df = df.infer_objects() print("After converting the type of columns is:") print(df.dtypes)
Output:
Before converting the type of columns are: a object b object c object dtype: object After converting the type of columns is: a int64 b object c object dtype: object
βοΈNote: In the above example, the “column a” got converted to int64
. However, columns b and c have no effects as the values were strings, not integers. If we need to convert these columns to an integer type, we have to use methods 1 and 2 instead.
Conclusion
We have come to the end of our discussion on this topic, and we went through numerous methods to change the column type in pandas of a DataFrame. Feel free to drop in your queries and let us know if this article helped you. If you wish to receive daily solutions and concepts to strengthen your Python skills, please subscribe.
Want to get started with Pandas in 10 mins? Follow this tutorial: 10 Minutes to Pandas [FINXTER]
Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.