5 Best Ways to Get the Data Type of a Column in Pandas

πŸ’‘ Problem Formulation: When working with Pandas DataFrames in Python, it’s crucial to know the data types of the columns for data preprocessing, analysis, and visualization tasks. Suppose you have a DataFrame, and you’re interested in knowing the data type of the ‘Price’ column to ensure it’s numeric before performing aggregations. The desired output is a simple indicator of the column’s data type.

Method 1: Using the dtypes Attribute

Pandas DataFrame’s dtypes attribute returns the data types of all columns in a DataFrame. This is helpful when you want a general overview of the data types within your data structure. To get the type of a specific column, you can index dtypes with the column name.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'Price': [19.99, 25.50, 'Not Available'], 'Product': ['T-Shirt', 'Shoes', 'Socks']})
print(df.dtypes['Price'])

Output:

dtype('O')

This code snippet first imports the pandas package. A DataFrame is then created with a ‘Price’ column containing mixed data types. By accessing the dtypes attribute and indexing it with ‘Price’, we retrieve the data type of the ‘Price’ column, which in this case, indicates an ‘object’ (typically meaning mixed types or categorical data).

Method 2: Using the info() Method

The info() method of a DataFrame provides a concise summary of the DataFrame, including the column data types. While it’s not used to extract the data type directly, it’s useful for a quick visual check of column types.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'Price': [19.99, 25.50, 30.00], 'Product': ['T-Shirt', 'Shoes', 'Socks']})
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Price   3 non-null      float64
 1   Product 3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes

In this code snippet, we import pandas and define a DataFrame. We then invoke the info() method, which prints out a summary, including the index dtype and columns, non-null values and their data types. For the ‘Price’ column, it clearly shows a ‘float64’ data type, meaning it contains floating-point numbers.

Method 3: Using the astype() Method

Pandas astype() method is typically used to cast a pandas object to a specified data type. Although it’s primarily for type conversion, you can use it to check the current data type by attempting to cast a column to its existing type and catching any errors.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'Price': [19.99, 25.50, 30.00]})
try:
    df['Price'] = df['Price'].astype(float)
    print("Column 'Price' data type is float.")
except ValueError:
    print("Column 'Price' is not of type float.")

Output:

Column 'Price' data type is float.

The code snippet attempts to convert the ‘Price’ column to a float. If the conversion is successful, we can infer that its data type was already float or a compatible type (e.g., int), and it confirms this by printing a message. If a ValueError occurs, it means the conversion failed and the column is not of type float.

Method 4: Using the dtypes Attribute with a Custom Function

You can also create a custom function to return the data type of a specified column. This might be overkill for a single column but could be useful when working with several DataFrames or performing repetitive tasks.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'Price': [19.99, 25.50, 30.00]})

# Define a custom function to get the type of a column
def get_column_dtype(dataframe, column_name):
    return dataframe.dtypes[column_name]

print(get_column_dtype(df, 'Price'))

Output:

float64

This custom function get_column_dtype simplifies the process of getting the data type of a column by taking in a DataFrame and a column name and returning the type using the dtypes attribute. It’s a reusable piece of code that can be handy in larger scripts or projects.

Bonus One-Liner Method 5: Using DataFrame Accessor dt or str

For specialized data types like datetime or string, pandas provides accessors (dt and str) to treat the columns as if they were of type datetime or string respectively. These accessors implicitly confirm the data type.

Here’s an example:

import pandas as pd

# Create a sample DataFrame with datetime objects
df = pd.DataFrame({'LaunchDate': pd.to_datetime(['2021-01-01', '2021-06-15', '2021-12-31'])})

print(type(df['LaunchDate'].dt))

Output:

<class 'pandas.core.indexes.accessors.DatetimeProperties'>

In this example, we have a DataFrame with a ‘LaunchDate’ column containing dates. When we access it using df['LaunchDate'].dt, Pandas treats it as a datetime object, and checking its type confirms that it is indeed a DatetimeProperties object, implicitly confirming the data type of the column is datetime.

Summary/Discussion

  • Method 1: The dtypes Attribute. Strengths: Direct and straightforward way to get the data type. Weaknesses: Gives the data type of all columns, not just one specific column.
  • Method 2: The info() Method. Strengths: Provides an overview of all columns’ data types at a glance. Weaknesses: Not programmatically useful for extracting the type.
  • Method 3: The astype() Method. Strengths: Can be used to check and confirm data types by casting. Weaknesses: Inelegant for just checking types, as it’s intended for conversion.
  • Method 4: Custom Function with dtypes. Strengths: Reusable in different contexts. Weaknesses: Overhead of writing a function for a simple task.
  • Method 5: Accessors dt and str. Strengths: Confirms specialized data types implicitly. Weaknesses: Only applicable to datetime and string types.