5 Best Ways to Convert String Data into Datetime in Python Pandas

πŸ’‘ Problem Formulation: When working with datasets in Python Pandas, it is common to encounter date information stored as strings. Converting these strings into a datetime type is crucial for time series analysis, enabling operations like resampling, time-based indexing, and more. As an example, a dataset may contain date information as ‘2022-03-01’, which should be converted to a datetime object to perform date-specific manipulations.

Method 1: Using pd.to_datetime()

Pandas provides a versatile function pd.to_datetime() for converting string data into datetime objects. This method can handle a variety of string formats and is capable of parsing dates and times from a DataFrame or a Series. The format parameter allows for specifying the exact pattern to match, ensuring accurate parsing even for non-standard formats.

Here’s an example:

import pandas as pd

date_series = pd.Series(["2021-01-01", "2021-02-01", "2021-03-01"])
date_series_datetime = pd.to_datetime(date_series, format="%Y-%m-%d")

print(date_series_datetime)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

The code converts a series of strings representing dates into datetime objects, ensuring they can now be manipulated as actual dates within Pandas.

Method 2: Inferring Datetime Format Automatically

When string formats vary or follow common patterns, Pandas’ pd.to_datetime() can automatically infer the correct format without specifying the format argument. This feature is convenient when working with datasets that contain dates in multiple recognized formats.

Here’s an example:

mixed_date_series = pd.Series(["January 1, 2021", "2021/02/01", "03-01-2021"])
dates_inferred = pd.to_datetime(mixed_date_series)

print(dates_inferred)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

Here, the function intelligently identifies and parses different date formats from a series of date strings, converting them effectively into datetime objects.

Method 3: Using strftime() and strptime()

The Python standard library offers datetime.strptime() for parsing dates from strings and datetime.strftime() for formatting datetime objects as strings. Though not part of Pandas, these functions can be applied to Pandas objects through the apply() method, providing compatibility with non-Pandas date parsing needs.

Here’s an example:

from datetime import datetime

date_series_strp = date_series.apply(lambda x: datetime.strptime(x, "%Y-%m-%d"))

print(date_series_strp)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

This example uses the apply() method to individually convert each string in the series to a datetime object using Python’s native datetime.strptime() method.

Method 4: Using Pandas DataFrame.apply() Method

Sometimes, datasets have date and time split across different columns. In such cases, combining them into a single datetime column requires a series of steps using the apply() method on DataFrames instead of Series, passing in a lambda function that specifies how to parse and combine the individual string components.

Here’s an example:

df = pd.DataFrame({
  'date': ["2021-01-01", "2021-02-01", "2021-03-01"],
  'time': ["10:00:00", "11:00:00", "12:00:00"]
})
df['datetime'] = df.apply(lambda row: pd.to_datetime(f"{row['date']} {row['time']}"), axis=1)

print(df['datetime'])

Output:

0   2021-01-01 10:00:00
1   2021-02-01 11:00:00
2   2021-03-01 12:00:00
Name: datetime, dtype: datetime64[ns]

This method enables combining date and time from separate columns into a new column that holds the combined datetime information.

Bonus One-Liner Method 5: Using List Comprehension

A more Pythonic approach can be employing list comprehension with datetime.strptime() applied to a Pandas Series object. This method is concise and can be efficient for smaller datasets or simple date patterns.

Here’s an example:

date_series_list_comp = pd.Series([datetime.strptime(date, "%Y-%m-%d") for date in date_series])

print(date_series_list_comp)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

This line of code rapidly transforms a series of string dates into datetime objects through the elegance of list comprehension.

Summary/Discussion

  • Method 1: Using pd.to_datetime(). Highly versatile and recommended for most use cases. Handles a multitude of date formats and provides the option to specify the format. Can be less efficient with very large datasets or non-standard date formats.
  • Method 2: Inferring Datetime Format Automatically. Convenient for mixed date formats recognized by Pandas. It reduces the need for specifying the format, but may raise errors or give incorrect results with ambiguous or unrecognized date strings.
  • Method 3: Using strftime() and strptime(). This is a go-to for those familiar with Python’s datetime handling outside of Pandas and provides a bridge between Pandas and Python’s native datetime methods. Could be less efficient due to the use of apply().
  • Method 4: Using Pandas DataFrame.apply() Method. Ideal for combining date and time from separate columns within a DataFrame. It is flexible but may suffer from the same inefficiencies of apply().
  • Method 5: Bonus One-Liner Using List Comprehension. Offers a Pythonic and succinct way of converting dates. It’s great for small datasets and simple conversions but may not scale well with larger datasets or handle complex parsing.