Extracting Hour of Day from Datetime Objects in Python Pandas

πŸ’‘ Problem Formulation: When working with time series data in Pandas, a common task is to isolate specific components, such as the hour of the day, from a datetime index or column. For instance, given a DataFrame with a datetime column, we may want to extract just the hour component (e.g., 0-23) for further analysis or feature engineering. This article discusses multiple methods to efficiently accomplish this task.

Method 1: Using dt.hour Accessor

The most straightforward way to extract the hour component from a Pandas datetime object is through the dt.hour accessor. It is simple to use and very intuitive, making it the first-choice method for beginners and experienced developers alike.

Here’s an example:

import pandas as pd

# Create a datetime series
series = pd.Series(pd.date_range("2023-01-01 12:45", periods=3, freq="H"))

# Extract the hour of the day
hours = series.dt.hour

print(hours)

Output: 0 12 1 13 2 14 dtype: int64

This code snippet creates a series of datetime objects spaced one hour apart, and then simply accesses the hour component using .dt.hour. Each datetime object is broken down into its hour component, represented as an integer from 0 to 23.

Method 2: Using lambda and apply()

Another method involves using a combination of the apply() function with a lambda expression to extract the hour. While not as concise as the first method, it offers flexibility for more complex manipulations.

Here’s an example:

import pandas as pd

# Create a datetime series
series = pd.Series(pd.date_range("2023-01-01 12:45", periods=3, freq="H"))

# Use lambda function to extract hour
hours = series.apply(lambda x: x.hour)

print(hours)

Output: 0 12 1 13 2 14 dtype: int64

This snippet leverages a lambda function to iterate through each datetime object in the series, applying the .hour property on each to extract the hour. It’s especially useful when we need to chain multiple methods or when dealing with more complicated data manipulations.

Method 3: Using datetime.strptime() and hour Property

For strings representing date and time, the datetime.strptime() function from Python’s standard library can be used to parse the string into a datetime object, which can then be used to extract the hour using the hour property.

Here’s an example:

from datetime import datetime

# A datetime string
datetime_str = "2023-01-01 12:45:00"

# Parse string to datetime
parsed_datetime = datetime.strptime(datetime_str, "%Y-%m-%d %H:%M:%S")

# Extract the hour
hour = parsed_datetime.hour

print(hour)

Output: 12

This example parses a datetime string to a datetime object with datetime.strptime() and then gets the hour with .hour. This method excels when working with datetime in string format as opposed to Pandas datetime objects.

Method 4: Using Timestamp.hour For Single Timestamps

If working with an individual Timestamp object, directly using the hour property is a quick and straightforward method to extract the hour component.

Here’s an example:

import pandas as pd

# A single Timestamp object
timestamp = pd.Timestamp("2023-01-01 12:45")

# Extract the hour
hour = timestamp.hour

print(hour)

Output: 12

This code creates a single Timestamp object and retrieves the hour of the day from it using the .hour property. This method is tailored for situations where we are handling individual Timestamps rather than a series or DataFrame.

Bonus One-Liner Method 5: Using List Comprehension

Though not exclusive to Pandas, list comprehension offers a Pythonic way to extract hours from a collection of datetime objects in a concise one-line command.

Here’s an example:

import pandas as pd

# Create a datetime series
series = pd.Series(pd.date_range("2023-01-01 12:45", periods=3, freq="H"))

# Extract hours using list comprehension
hours = [time.hour for time in series]

print(hours)

Output: [12, 13, 14]

This snippet utilizes list comprehension to iterate through each element in the datetime series and extract the hour component into a new list. It’s a clean and Pythonic one-liner that can be easily understood by those familiar with Python’s list comprehensions.

Summary/Discussion

  • Method 1: dt.hour. Straightforward and fast. Best for use within Pandas Series.
  • Method 2: lambda with apply(). Flexible and powerful. Good for complex operations but slightly less performant.
  • Method 3: datetime.strptime() with hour. Ideal for datetime in string format before placing it into a Pandas Series or DataFrame.
  • Method 4: Timestamp.hour. Simple and direct for individual timestamp objects.
  • Method 5: List Comprehension. Pythonic and succinct. Good for those comfortable with Python syntax, not as Pandas-centric.