Converting Pandas Dataframes to NumPy Arrays of Python Datetime Objects

πŸ’‘ Problem Formulation: When working with time series data in Pandas, there may be a need to convert a dataframe column with datetime entries into a NumPy array of Python datetime.time objects. Imagine you have a dataframe with a column representing timestamps, and you wish to extract just the time component as a NumPy array. This could facilitate various time-based analyses or integrations with systems that require time objects.

Method 1: Using dt.time with to_numpy()

This method involves accessing the dt accessor on a Pandas Series containing datetime objects and then converting the resulting series of time objects using to_numpy(). This gives you a NumPy array of Python datetime.time objects, which is useful in efficiently handling large datasets.

Here’s an example:

import pandas as pd
import numpy as np

# Create a dataframe with timestamp data
df = pd.DataFrame({'Timestamp': pd.date_range(start='2023-01-01 08:00', periods=4, freq='H')})

# Convert to NumPy array of time objects
time_array = df['Timestamp'].dt.time.to_numpy()


Output of this code snippet:

[datetime.time(8, 0), datetime.time(9, 0), datetime.time(10, 0), datetime.time(11, 0)]

This snippet creates a range of timestamps, extracts the time component, and converts it to a NumPy array containing datetime.time objects. The dt accessor is a powerful tool in Pandas for datetime-like properties.

Method 2: Using apply() method with lambda function

The apply() method in Pandas can be utilized with a lambda function to process each datetime object in a Series and extract its time component. Afterward, the resulting series is converted to a NumPy array. This method offers customization power for more complex transformations.

Here’s an example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Timestamp': pd.date_range(start='2023-01-01 08:00', periods=4, freq='H')})

# Use apply with a lambda to extract time and convert to NumPy array
time_array = df['Timestamp'].apply(lambda x: x.time()).to_numpy()


Output of this code snippet:

[datetime.time(8, 0), datetime.time(9, 0), datetime.time(10, 0), datetime.time(11, 0)]

This code snippet uses apply() to run a lambda function, which extracts the time from each datetime object, over the dataframe column. The output is a series of time objects that is then turned into a NumPy array.

Method 3: Using List Comprehension

List comprehension in Python provides a concise way to construct lists. You can use list comprehension to iterate over the datetime objects in a Pandas Series and collect their time parts. The list can then be converted to a NumPy array.

Here’s an example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Timestamp': pd.date_range(start='2023-01-01 08:00', periods=4, freq='H')})

# Convert to NumPy array using list comprehension
time_array = np.array([time.time() for time in df['Timestamp']])


Output of this code snippet:

[datetime.time(8, 0), datetime.time(9, 0), datetime.time(10, 0), datetime.time(11, 0)]

In this approach, we use a list comprehension to iterate over the ‘Timestamp’ column and call time() on each datetime object. The resulting list of time objects is converted to a NumPy array using the np.array() function.

Method 4: Using map() Function

Python’s built-in map() function applies a given function to each item of an iterable (like a list or series) and returns a list of the results. We can use this to apply the time() method to each element in our dataframe’s datetime series.

Here’s an example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Timestamp': pd.date_range(start='2023-01-01 08:00', periods=4, freq='H')})

# Use map to apply time() method to each datetime object and convert to NumPy array
time_array = np.array(list(map(lambda x: x.time(), df['Timestamp'])))


Output of this code snippet:

[datetime.time(8, 0), datetime.time(9, 0), datetime.time(10, 0), datetime.time(11, 0)]

This example uses map() with a lambda function that returns the time portion of the datetime object. The resulting map object is converted to a list, which is then turned into a NumPy array.

Bonus One-Liner Method 5: Using numpy vectorize() function

NumPy’s vectorize() function can convert a regular Python function into a vectorized function. This allows the function to act on arrays efficiently, which we can leverage to convert our datetime series directly to an array of time objects.

Here’s an example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Timestamp': pd.date_range(start='2023-01-01 08:00', periods=4, freq='H')})

# Vectorize the time extraction and apply on the entire series
vectorized_time = np.vectorize(lambda x: x.time())
time_array = vectorized_time(df['Timestamp'].values)


Output of this code snippet:

[datetime.time(8, 0), datetime.time(9, 0), datetime.time(10, 0), datetime.time(11, 0)]

This compact example creates a vectorized function that extracts the time portion of datetime objects and then applies it to the values of the pandas series to get our desired NumPy array of time objects.


  • Method 1: dt.time with to_numpy(). Strengths: It is a straightforward and idiomatic approach specific to Pandas. Weaknesses: Reliant on Pandas implementation and might not offer as much flexibility for complex data manipulations.
  • Method 2: apply() with lambda function. Strengths: Offers flexibility and is useful for more complex data transformations. Weaknesses: Might be less efficient than vectorized operations.
  • Method 3: Using List Comprehension. Strengths: Pythonic and easy to read. Weaknesses: Potentially less performant with very large datasets because it’s not a vectorized operation.
  • Method 4: Using map() Function. Strengths: Works with any iterable and is part of Python’s standard functions. Weaknesses: Results in an intermediate list, which can be memory-inefficient.
  • Bonus One-Liner Method 5: Using NumPy vectorize(). Strengths: Efficient one-liner suited for simple transformations. Weaknesses: Overhead in creating the vectorized function may not be as efficient as optimized pandas/numpy methods for large datasets.