5 Best Ways to Get the Data Types of Columns in Python

πŸ’‘ Problem Formulation: When working with structured data in Python, especially in data analysis and machine learning, it’s crucial to understand the data types of each column in your dataset. Say you have a Pandas DataFrame and you want to quickly check the data types to ensure you perform the correct operations on each column. The desired output is a listing or mapping of each column name to its corresponding data type.

Method 1: Using the DataFrame dtypes Attribute

The dtypes attribute of a Pandas DataFrame returns a Series with the data type of each column. It’s incredibly straightforward and built-in, requiring no extra method calls or parameters.

Here’s an example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [1.0, 2.0, 3.0],
    'C': ['a', 'b', 'c']
})

# Get the data types of each column
print(df.dtypes)

Output:

A      int64
B    float64
C     object
dtype: object

This snippet creates a DataFrame with three columns of different types: integer, float, and object (typically string). It then prints out the data type of each column using the dtypes attribute.

Method 2: Using the info() Method

The info() method of a DataFrame provides a concise summary of the DataFrame, including the data types of each column as well as non-null values and memory usage.

Here’s an example:

# Using the same DataFrame as in Method 1
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       3 non-null      int64  
 1   B       3 non-null      float64 
 2   C       3 non-null      object 
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

This code calls the info() method on our DataFrame, which prints a summary that includes the data type of each column as part of its output, along with additional information.

Method 3: Using the astype() Method

The astype() method is used to cast a pandas object to a specified data type. When calling it without a specific type, it can be used to display the types without changing them.

Here’s an example:

# Using the same DataFrame as in Method 1
print(df.astype('object').dtypes)

Output:

A    object
B    object
C    object
dtype: object

This code uses astype('object') to cast the DataFrame columns to ‘object’ type and then prints out the resulting data types with dtypes, effectively giving us the original datatypes as ‘object’.

Method 4: Using a List Comprehension with the type() Function

A list comprehension can be used to apply the type() function to each element of the DataFrame columns. It’s a more manual approach and may not be as efficient as using pandas built-in methods.

Here’s an example:

# Using the same DataFrame as in Method 1
column_types = {column: type(df[column][0]) for column in df.columns}
print(column_types)

Output:

{'A': <class 'int'>, 'B': <class 'float'>, 'C': <class 'str'>}

This snippet uses a dictionary comprehension to iterate over each column, checking the data type of the first element within each column. This will not always be accurate, as different rows could contain data of different types.

Bonus One-Liner Method 5: Using applymap() with type()

The applymap() function applies a function to each element of the DataFrame. When combined with the type() function, it can be used to check the datatype of each element. However, it’s often used for type comparison or conversion rather than just retrieval.

Here’s an example:

# Using the same DataFrame as in Method 1
print(df.applymap(type))

Output:

                A               B               C
0  <class 'int'>  <class 'float'>  <class 'str'>
1  <class 'int'>  <class 'float'>  <class 'str'>
2  <class 'int'>  <class 'float'>  <class 'str'>

This code applies the function type() to every element in the DataFrame, which displays the data type of each individual element. However, as shown, it can be quite verbose for large DataFrames.

Summary/Discussion

  • Method 1: dtypes Attribute. Fast and built-in. May not display the full object type for more complex composite types.
  • Method 2: info() Method. Offers a detailed summary including data types. Provides more information than strictly necessary if only data types are needed.
  • Method 3: astype() Method. Can display types without converting. Using it just to get datatypes is unconventional and may confuse readers of code.
  • Method 4: List Comprehension with type(). Offers control over the data type displayed. It’s less efficient and the result might be misleading with mixed data types in columns.
  • Bonus Method 5: applymap() with type(). Good for element-wise type checking. It is not suited for a straight-forward column type retrieval due to verbosity.