Change Column Type in Pandas

[toc]

Problem Statement: How to change the column type in pandas in Python?

Here’s a related question found on Stackoverflow:

So, our mission today is to answer this question. Hence, we are going to learn about the different ways of changing the type of columns in pandas. Let’s create a pandas dataframe that we will use throughout the tutorial to understand the solutions.

import pandas as pd
df = pd.DataFrame(
  [
    ('10', 1, 'a'),
    ('20', 2, 'b'),
    ('30', 3, 'c'),
    ('40', 4, 'd'),
  ],
  columns=list('abc')
)
print(df)
print("The type of the columns are:")
print(df.dtypes)

Output

    a  b  c
0  10  1  a
1  20  2  b
2  30  3  c
3  40  4  d
The type of the columns are:
a    object
b     int64
c    object
dtype: object

✏️Note: The ‘df.dtypes’ method is used to print the types of the column.

We now have our dataframe. So, without further ado let’s dive into the different methods to change the column type.

🐼Method 1: Using to_numeric()

The best way to change one or more columns of a DataFrame to the numeric values is to use the to_numeric() method of the pandas module. It is used to convert the columns with non-numeric data types (such as strings) to numeric types (such as integers or floating-point numbers).

  • If the column has numbers without decimals, to_numeric() will convert it to int64
  • If the column has numbers with decimal points, to_numeric() will convert it to float64.
Syntax: pd.to_numeric(df[column name])

Example: We will change the type of first column in our dataframe.

import pandas as pd
df = pd.DataFrame(
  [
    ('10', 1, 'a'),
    ('20', 2, 'b'),
    ('30', 3, 'c'),
    ('40', 4, 'd'),
  ],
  columns=list('abc')
)
print("Before converting the type of columns are:")
print(df.dtypes)
# Converting column a
df['a'] = pd.to_numeric(df['a'])
print("\nAfter converting the type of columns is:")
print(df.dtypes)

Output:

Before converting the type of columns are:
a    object
b     int64
c    object
dtype: object

After converting the type of columns is:
a     int64
b     int64
c    object
dtype: object

We can also change multiple columns into numeric type by using the apply() method as shown in the following example: 

Example:

import pandas as pd
df = pd.DataFrame(
  [
    ('10', '1', 'a'),
    ('20', '2', 'b'),
    ('30', '3', 'c'),
    ('40', '4', 'd'),
  ],
  columns=list('abc')
)
print("Before converting the type of columns are:")
print(df.dtypes)
# Converting column a and column b
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)
print("\nAfter converting the type of columns is:")
print(df.dtypes

Output:

Before converting the type of columns are:
a    object
b    object
c    object
dtype: object

After converting the type of columns is:
a     int64
b     int64
c    object

How to handle the errors that occur during conversion?

The to_numeric() method also takes the “errors” argument. It forces the non-numeric values to NaN, or it simply ignores the columns that contain these values.

  • We can use the errors = 'ignore' to ignore the errors that occur during the conversion. By using ignore the conversion gets stopped silently without raising any errors. 
  • We can use the errors = 'coerce' to convert the columns forcibly even if it has some invalid values.
  • We can use the errors = 'raise' to raise an error when it contains a value that is not available in one of the columns or it cannot be converted to a number.

🐼Method 2: Using astype()

The astype() method helps to change the column type explicitly to a specified dtype. The specified data type can be a built-in Python datatype, NumPy, or pandas dtype.

Example:

import pandas as pd
df = pd.DataFrame(
  [
    ('10', 1, 'a'),
    ('20', 2, 'b'),
    ('30', 3, 'c'),
    ('40', 4, 'd'),
  ],
  columns=list('abc')
)
print("Before converting the type of columns are:")
print(df.dtypes)
# Converting column c
df['a'] = df['a'].astype(int)
print("\nAfter converting the type of columns is:")
print(df.dtypes)

Output:

Before converting the type of columns are:
a    object
b     int64
c    object
dtype: object

After converting the type of columns is:
a     int32
b     int64
c    object
dtype: object

🐼Method 3: Using convert_dtypes()

The convert_dtypes() method is used to convert the columns to the possible data types by using the dtypes supporting missing values (the dtype will be determined at runtime) The dtype is based on the value included in each of the columns. 

Example:

import pandas as pd
df = pd.DataFrame(
  [
    ('10', 1, 'a'),
    ('20', 2, 'b'),
    ('30', 3, 'c'),
    ('40', 4, 'd'),
  ],
  columns=list('abc')
)
print("Before converting the type of columns are:")
print(df.dtypes)
df = df.convert_dtypes()
print("\nAfter converting the type of columns is:")
print(df.dtypes)

Output:

Before converting the type of columns are:
a    object
b     int64
c    object
dtype: object

After converting the type of columns is:
a    string
b     Int64
c    string
dtype: object

✏️Note: This method converts the dtype implicitly. Hence if you want to convert a dtype explicitly (like object to int) you should use the other methods instead.

🐼Method 4: Using infer_objects()

The infer_objects() method is similar to the previous method as it is used to convert the columns that have an object data type to a specific type (soft conversions).

Example:

import pandas as pd

df = pd.DataFrame({'a': [10, 20, 30, 40],
                   'b': ['1', '2', '3', '4'],
                   'c': ['a', 'b', 'c', 'd']
                   },
                  dtype='object'
                  )
print("Before converting the type of columns are:")
print(df.dtypes)
df = df.infer_objects()
print("After converting the type of columns is:")
print(df.dtypes)

Output:

Before converting the type of columns are:
a    object
b    object
c    object
dtype: object

After converting the type of columns is:
a     int64
b    object
c    object
dtype: object

✏️Note: In the above example, the “column a” got converted to int64. However, columns b and c have no effects as the values were strings, not integers. If we need to convert these columns to an integer type, we have to use methods 1 and 2 instead.

Conclusion

We have come to the end of our discussion on this topic, and we went through numerous methods to change the column type in pandas of a DataFrame. Feel free to drop in your queries and let us know if this article helped you. If you wish to receive daily solutions and concepts to strengthen your Python skills, please subscribe.

Want to get started with Pandas in 10 mins? Follow this tutorial: 10 Minutes to Pandas [FINXTER]


Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.

Leave a Comment