How to Check Data Type of a Pandas Series - Be on the Right Side of Change

Problem Formulation and Solution Overview

In this article, you’ll learn how to check the Data Types of a Pandas Series.

A Series is a 1D (one-dimensional) array. This Series can be of various data types, such as an integer, a string, a float or even an object!

A good practice is to ensure, before performing any calculations in a Pandas Series, to validate the Data Types. Doing this will prevent type errors in your code.

This article uses the fictitious finxters_sample.csv file. The contents of which are shown below.

FID	Rank	Solved	Avg	Yearly	Taxes
30022145	Authority	1915	89.08	$143.76	7
30022192	Beginner	1001	76.15	$143.76	8
30022331	Basic Knowledge	15	2.68	$119.40	5
30022345	Authority	1110	10.46	$131.76	3
30022157	Authority	1875	85.98	$143.76	6

💬 Question: How would we write code to check these Data Types?

We can accomplish this task by one of the following options:

Method 1: Use Pandas dtypes
Method 2: Use Pandas astype()
Method 3: Use apply()
Method 4: Use Use apply() and unique()
Method 5: Use apply() and value_counts()
Method 6: Use List Comprehension

Preparation

Before moving forward, please ensure the Pandas library is installed. Click here if you require instructions.

Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import pandas as pd

After importing the Pandas library, this library is referenced by calling the shortcode (pd).

Method 1: Use Pandas dtypes

This method uses dtypes. This function verifies and returns an object representing the Data Types of a given DataFrame Series/Column.

users = pd.read_csv('finxters_sample.csv')
print(users.dtypes)

Above, reads in the finxters_sample.csv file and saves it to the DataFrame users.

Then, dtypes is appended to users and output to the terminal.

This function determines the Data Type of each DataFrame Series (Column) and returns an object containing the same.

FID	int64
Rank	object
Solved	int64
Avg	float64
Yearly	object
Taxes	int64
dtype: object

What happens when we attempt to add two (2) different data types together (an object and an int64)?

Let’s add the first entry of Yearly row 0 ($143.76) with the Taxes of 7 in the same row.

print(users['Yearly'][0] + users['Taxes'][0])

When this code runs, an error similar to below occurs. Pandas does not allow mathematical operations on some data types, such as an object (a string in this case) and an Integer (int64).

File "C:\method-1.py", line 4, in <module>
print(users['Yearly'][0] + users['Taxes'][0])
TypeError: can only concatenate str (not "numpy.int64") to str

This error occurs because the Yearly value of $143.76 is an object (string) and needs to be converted to the proper Data Type of float64 before performing the calculation.

The code below removes the dollar sign ($) using slicing and converts this value to a float64 (146.76).

result = (float(users['Yearly'][0][1:]) + users['Taxes'][0])
print(result)

Once converted, the addition operation is performed, saved to results, and output to the terminal.

150.76

💡Note: The users['Taxes'][0] value does not need to be converted to a float. This is because the int64 Data Type automatically converts to a float64 when the addition is performed.

Method 2: Use astype()

This method uses astype(). This method doesn’t determine the Data Type but can convert the current Data Type to a different one.

df = pd.read_csv('finxters_sample.csv', usecols=['Yearly'])
df['Yearly'] = df['Yearly'].str[1:].astype('float64')
print(df['Yearly'])

For this example, the Yearly column from the finxters_sample.csv is read in and saved to a DataFrame df. Remember, this field is an object (a string in this case) as it contains a leading dollar sign ($).

	Yearly
0	$143.76
1	$143.76
2	$119.40
3	$131.76
4	$143.76

To convert this to a float64, the dollar sign ($) is stripped for the Yearly column (str[1:]). Then astype() is called and passed one (1) argument ('float64'). The results save back to df['Yearly'].

As you can see, when the output is sent to the terminal, the Data Type for the above column is now a float64.

	Yearly
0	143.76
1	143.76
2	119.40
3	131.76
4	143.76

This can be confirmed by outputting the following to the terminal.

print(df['Yearly'].dtype)

float64

Method 3: Use apply()

The apply() method allows the coder to apply a function along a DataFrame axis (default 0, the row axis).

avg_list = pd.Series(['Avgs', 89.08, 76, 2.68, 10.46, 85.98]).apply(type)
print(avg_list)

For this example, the Averages for five (5) Finxter users are saved as a DataFrame Series.

The apply() function is appended to avg_list and passed one (1) argument, type. The output is sent to the terminal.

This function (apply(type)) determines the Data Type for each element and returns an object indicating the same.

0	<class ‘str’>
1	<class ‘float’>
2	<class ‘int’>
3	<class ‘float’>
4	<class ‘float’>
dtype: object

Method 4: Use apply() and unique()

This method uses apply() and unique() to retrieve a List of unique Data Types in the Series.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
print(misc_lst.apply(type).unique())

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

The apply() function is appended to misc_lst and passed one (1) argument, type. Next the function unique() is appended to apply(type).

This output results in unique Data Type objects, which display in a List format.

[ <class 'str'> <class 'int'> <class 'float'>]

Method 5: Use apply() and value_counts()

This method uses apply() and value_counts() to determine the Data Types of each Series/Column element and totals the number of times each occurs.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
result = misc_lst.apply(type).value_counts()
print(result)

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

The apply() function is appended to misc_lst and passed one (1) argument, type. Next the function value_counts() is appended to apply(type). This determines how many times each different Data Type occurs in the List.

This output results in the following object, which displays each Data Type’s total count.

<class ‘str’>	2
<class ‘int’>	2
<class ‘float’>	1
dtype: int64

Method 6: Use List Comprehension

This method uses List Comprehension to quickly and efficiently return a List of Data Types of a Series/Columns.

misc_lst = pd.Series(['50', 2, 3, '57', 23.87])
result = [x for x in misc_lst.apply(type).unique()]
print(result)

For this example, a DataFrame Series is created containing random data and saved to misc_lst.

Then, List Comprehension is used to traverse through each element, using apply() and unique() to determine the unique Data Type and save it as a List of objects.

The results are output to the terminal.

[ <class 'str'>, <class 'int'>, <class 'float'>]

✨A Finxter Favorite!

Summary

These six (6) methods of checking the Data Type of a Pandas Series should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!

Problem Formulation and Solution Overview

Preparation

Method 1: Use Pandas dtypes

Method 2: Use astype()

Method 3: Use apply()

Method 4: Use apply() and unique()

Method 5: Use apply() and value_counts()

Method 6: Use List Comprehension

Summary

Programmer Humor – Blockchain