How to Check Data Type of a Pandas Series

5/5 - (5 votes)

Problem Formulation and Solution Overview

In this article, you’ll learn how to check the Data Types of a Pandas Series.

A Series is a 1D (one-dimensional) array. This Series can be of various data types, such as an integer, a string, a float or even an object!

A good practice is to ensure, before performing any calculations in a Pandas Series, to validate the Data Types. Doing this will prevent type errors in your code.

This article uses the fictitious finxters_sample.csv file. The contents of which are shown below.

FIDRankSolvedAvgYearlyTaxes
30022145Authority191589.08$143.767
30022192Beginner100176.15$143.768
30022331Basic Knowledge152.68$119.405
30022345Authority111010.46$131.763
30022157Authority187585.98$143.766

πŸ’¬ Question: How would we write code to check these Data Types?

We can accomplish this task by one of the following options:


Preparation

Before moving forward, please ensure the Pandas library is installed. Click here if you require instructions.

Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import pandas as pd

After importing the Pandas library, this library is referenced by calling the shortcode (pd).


Method 1: Use Pandas dtypes

This method uses dtypes. This function verifies and returns an object representing the Data Types of a given DataFrame Series/Column.

users = pd.read_csv('finxters_sample.csv')
print(users.dtypes)

Above, reads in the finxters_sample.csv file and saves it to the DataFrame users.

Then, dtypes is appended to users and output to the terminal.

This function determines the Data Type of each DataFrame Series (Column) and returns an object containing the same.

FIDint64
Rankobject
Solvedint64
Avgfloat64
Yearlyobject
Taxesint64
dtype: object

What happens when we attempt to add two (2) different data types together (an object and an int64)?

Let’s add the first entry of Yearly row 0 ($143.76) with the Taxes of 7 in the same row.

print(users['Yearly'][0] + users['Taxes'][0])

When this code runs, an error similar to below occurs. Pandas does not allow mathematical operations on some data types, such as an object (a string in this case) and an Integer (int64).

File "C:\method-1.py", line 4, in <module>
print(users['Yearly'][0] + users['Taxes'][0])
TypeError: can only concatenate str (not "numpy.int64") to str

This error occurs because the Yearly value of $143.76 is an object (string) and needs to be converted to the proper Data Type of float64 before performing the calculation.

The code below removes the dollar sign ($) using slicing and converts this value to a float64 (146.76).

result = (float(users['Yearly'][0][1:]) + users['Taxes'][0])
print(result)

Once converted, the addition operation is performed, saved to results, and output to the terminal.

150.76

πŸ’‘Note: The users['Taxes'][0] value does not need to be converted to a float. This is because the int64 Data Type automatically converts to a float64 when the addition is performed.

10 Minutes to Pandas in 5 Minutes (Okay 8)

Method 2: Use astype()

This method uses astype(). This method doesn’t determine the Data Type but can convert the current Data Type to a different one.

df = pd.read_csv('finxters_sample.csv', usecols=['Yearly'])
df['Yearly'] = df['Yearly'].str[1:].astype('float64')
print(df['Yearly'])

For this example, the Yearly column from the finxters_sample.csv is read in and saved to a DataFrame df. Remember, this field is an object (a string in this case) as it contains a leading dollar sign ($).

Yearly
0$143.76
1$143.76
2$119.40
3$131.76
4$143.76

To convert this to a float64, the dollar sign ($) is stripped for the Yearly column (str[1:]). Then astype() is called and passed one (1) argument ('float64'). The results save back to df['Yearly'].

As you can see, when the output is sent to the terminal, the Data Type for the above column is now a float64.

Yearly
0143.76
1143.76
2119.40
3131.76
4143.76

This can be confirmed by outputting the following to the terminal.

print(df['Yearly'].dtype)
float64
The Ultimate Guide to Slicing in Python

Method 3: Use apply()

The apply() method allows the coder to apply a function along a DataFrame axis (default 0, the row axis).

avg_list = pd.Series(['Avgs', 89.08, 76, 2.68, 10.46, 85.98]).apply(type)
print(avg_list)

For this example, the Averages for five (5) Finxter users are saved as a DataFrame Series.

The apply() function is appended to avg_list and passed one (1) argument, type. The output is sent to the terminal.

This function (apply(type)) determines the Data Type for each element and returns an object indicating the same.

0<class ‘str’>
1<class ‘float’>
2<class ‘int’>
3<class ‘float’>
4<class ‘float’>
dtype: object
The Pandas apply() function

Method 4: Use apply() and unique()

This method uses apply() and unique() to retrieve a List of unique Data Types in the Series.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
print(misc_lst.apply(type).unique()) 

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

The apply() function is appended to misc_lst and passed one (1) argument, type. Next the function unique() is appended to apply(type).

This output results in unique Data Type objects, which display in a List format.

[ <class 'str'> <class 'int'> <class 'float'>]
Python List Methods

Method 5: Use apply() and value_counts()

This method uses apply() and value_counts() to determine the Data Types of each Series/Column element and totals the number of times each occurs.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
result = misc_lst.apply(type).value_counts()
print(result)

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

The apply() function is appended to misc_lst and passed one (1) argument, type. Next the function value_counts() is appended to apply(type). This determines how many times each different Data Type occurs in the List.

This output results in the following object, which displays each Data Type’s total count.

<class ‘str’>2
<class ‘int’>2
<class ‘float’>1
dtype: int64

Method 6: Use List Comprehension

This method uses List Comprehension to quickly and efficiently return a List of Data Types of a Series/Columns.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
result = [x for x in misc_lst.apply(type).unique()]
print(result)

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

Then, List Comprehension is used to traverse through each element, using apply() and unique() to determine the unique Data Type and save it as a List of objects.

The results are output to the terminal.

[ <class 'str'>, <class 'int'>, <class 'float'>]

✨A Finxter Favorite!

Python One-Liner Trick 9 - Nested List Comprehension

Summary

These six (6) methods of checking the Data Type of a Pandas Series should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!


Programmer Humor – Blockchain

“Blockchains are like grappling hooks, in that it’s extremely cool when you encounter a problem for which they’re the right solution, but it happens way too rarely in real life.” source xkcd