5 Best Ways to Convert Python Pandas Series to datetime

πŸ’‘ Problem Formulation: Analysts often deal with time series data in Python, where date and time information is stored as strings within a Pandas Series object. The issue arises when these string dates need to be converted into actual datetime objects for proper time series analysis. For example, the input might be a Series of ‘2021-01-01’, ‘2021-01-02’, … and the desired output is a Series with actual Python datetime objects for these dates.

Method 1: Using pd.to_datetime()

One of the most straightforward ways to convert a Pandas Series to datetime is by using the built-in Pandas function pd.to_datetime(). This function is highly flexible, automatically infers the date format, and can handle missing values. It can convert various input formats, including scalar, list, series, and even DataFrame columns to datetime.

Here’s an example:

import pandas as pd

date_series = pd.Series(['2021-01-01', '2021-01-02', '20210103'])
date_series_dt = pd.to_datetime(date_series)
print(date_series_dt)

Output:

0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]

This code snippet creates a Pandas Series with string dates and then uses pd.to_datetime() to convert the Series into datetime objects. The new Series has a dtype of datetime64[ns], indicating that the conversion was successful.

Method 2: Specifying Date Format Manually

While pd.to_datetime() often correctly infers the format of the dates provided, there can be cases where it is necessary to manually specify the format to parse the dates correctly. This can be accomplished by using the format parameter, which can optimize performance as well by skipping format auto-detection.

Here’s an example:

import pandas as pd

date_series = pd.Series(['01-01-2021', '02-01-2021', '03-01-2021'])
date_series_dt = pd.to_datetime(date_series, format='%d-%m-%Y')
print(date_series_dt)

Output:

0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]

In this example, the format parameter is set to match the date format of the strings in the Series. This tells pd.to_datetime() exactly how to parse the dates, resulting in a datetime Series.

Method 3: Handling Errors with errors Argument

Sometimes the Series may contain invalid date strings. The errors parameter in pd.to_datetime() lets you dictate how to handle these scenarios, such as ignoring errors or coercion of invalid values to NaT (Not a Time).

Here’s an example:

import pandas as pd

date_series = pd.Series(['2021-01-01', 'not a date', '2021-01-03'])
date_series_dt = pd.to_datetime(date_series, errors='coerce')
print(date_series_dt)

Output:

0   2021-01-01
1          NaT
2   2021-01-03
dtype: datetime64[ns]

The invalid string ‘not a date’ is coerced to NaT, allowing the other dates to be parsed without error. The result is a Series with valid datetime objects and NaT for any invalid entries.

Method 4: Converting Unix Timestamps

When working with Unix timestamps, we need to convert these numerical representations into datetime. Passing these timestamps directly to pd.to_datetime() and setting the unit parameter to ‘s’ for seconds (or ‘ms’ for milliseconds) accomplishes this.

Here’s an example:

import pandas as pd

timestamp_series = pd.Series([1609459200, 1609545600, 1609632000])
date_series_dt = pd.to_datetime(timestamp_series, unit='s')
print(date_series_dt)

Output:

0   2021-01-01 00:00:00
1   2021-01-02 00:00:00
2   2021-01-03 00:00:00
dtype: datetime64[ns]

The Unix timestamps in the Series are converted to datetime objects. The unit='s' parameter is critical here for indicating that the numbers represent seconds since the epoch.

Bonus One-Liner Method 5: Using Lambda Functions

For finer control or more complex transformations, we can apply a lambda function that utilizes the datetime module to convert each element in the Series to a datetime object.

Here’s an example:

import pandas as pd
from datetime import datetime

date_series = pd.Series(['20210101', '20210102', '20210103'])
date_series_dt = date_series.apply(lambda x: datetime.strptime(x, '%Y%m%d'))
print(date_series_dt)

Output:

0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]

This snippet uses the apply() method with a lambda function, which calls datetime.strptime() to parse the string according to the specified format. It’s a more manual method but gives full control over the parsing process.

Summary/Discussion

  • Method 1: pd.to_datetime(). Quick and flexible. It might not work as expected with very unusual date formats.
  • Method 2: Manually Specifying Date Format. Ensures correct parsing of dates. Requires knowledge of the exact format.
  • Method 3: Handling Errors. Provides robustness in the face of invalid data. Might introduce NaT which needs to be handled later.
  • Method 4: Converting Unix Timestamps. Straightforward for Unix timestamps. Not suitable for other date string formats without conversion.
  • Method 5: Using Lambda Functions. Highly customizable. Can be overkill for simple use cases and is slower than vectorized operations.