In this article, we will see how to work with date-time
in Pandas. We will learn how to convert strings into date-time
objects, how to create date ranges in various ways, how to work with absolute time units, and how to restructure our date values using several Pandas functions.
Why work with date-time?
Before we get to the actual coding, letβs first clarify why it is important to be able to work with date-time
when using Pandas.
Whether itβs about stocks, weather, or the price development of Bitcoin. You name it. In all sorts of datasets, we find date-time values. Which day did the stock reach its highest value? At what time of the day blew the wind in Canada the strongest? The list goes on and on.
Thatβs why it is critical to be capable to handle date-time values properly. Luckily for us, Pandas provides us with a wide range of functions to deal with date-time values which we will get to know in the following sections.
Convert an argument into a Pandas date-time object
Now that we have seen why it is important to work with date-time values, we will get to the actual coding in this section. To start things off, letβs have a look at a simple data frame example:
import pandas as pd df = pd.DataFrame({'Date': ["2021-04-01", "2021-04-15", "2021-05-01", "2021-05-15"], 'Num': range(4)})
This is how the DataFrame df
looks:
Date | Num | |
0 | 2021-04-01 | 0 |
1 | 2021-04-15 | 1 |
2 | 2021-05-01 | 2 |
3 | 2021-05-15 | 3 |
We import the Pandas library first and create a Pandas data frame afterward. The data frame contains a βDate
β column with four individual dates. Also, we get a βNum
β column that represents numbers from 0 to 3. This is a simplified example of real-world data where we have date values and some numbers assigned to each date.
Letβs check the data types of the columns:
>>> df.dtypes Date object Num int64 dtype: object
As we can see, the βDate
β column is an βobject
β data type. The βobject
β data type refers to strings and mixed data types, so Pandas does not recognize the βDate
β column as a date-time data type. Luckily, Pandas provides us with the to_datetime()
function:
df['Date'] = pd.to_datetime(df['Date'])
This line of code sets the βDate
β columnβs data type to the date-time data type.
We can approve this by checking the data frameβs data types once again:
>>> df.dtypes Date datetime64[ns] Num int64 dtype: object
As we can see, we successfully set the βDate
β columnβs data type to βdatetime64[ns]
β.
To work with this data frame conveniently, we can set the βDate
β column as the index of the data frame. We achieve this by applying the set_index()
method:
df = df.set_index('Date')
The DataFrame:
Num | |
Date | |
2021-04-01 | 0 |
2021-04-15 | 1 |
2021-05-01 | 2 |
2021-05-15 | 3 |
Now, we can perform all sorts of calculations on this dataset. We will just do one example to show you whatβs possible:
df['Num'].loc['2021-05'].sum() # 5
Here, we calculate the sum of the βNum
β column in the month of May. Since the βDate
β column is the index of the data frame, we can apply the loc[]
function here.
Working with date ranges
In the data frame above, we set the unique dates one by one which was not a big problem because it was a small example with four rows only. When we want to create greater ranges of dates, typing them all in manually can be annoying and is not efficient at all. Therefore, we apply the date_range() function which provides us with various ways to create ranges of dates:
>>> pd.date_range(start='01-01-2021', end='31-01-2021') DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16', '2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20', '2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24', '2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28', '2021-01-29', '2021-01-30', '2021-01-31'], dtype='datetime64[ns]', freq='D')
In this example, we use the date_range()
function with the two parameters βstart
β and βend
β where we set a starting date and an ending date respectively. The output is a DatetimeIndex
ranging from 01-01-2021 to 31-01-2021. We also get the data type which is βdatetime64[ns]
β again and the frequency which is βD
β and stands for βdays
β. This is a lot easier than typing it all in manually.
Instead of an end date, we can also define how many dates we want by applying the βperiods
β parameter:
>>> pd.date_range(start='01-01-2021', periods=10) DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10'], dtype='datetime64[ns]', freq='D')
We state the same starting date as before. The βperiods
β parameter is set to β10
β, so we get 10 dates in the outputted DatetimeIndex
.
Additionally, we can change the frequency. For example, we might not want to have a day frequency but a monthly one. This is where the βfreq
β parameter comes into play:
>>> pd.date_range(start='01-01-2021', periods=10, freq='M') DatetimeIndex(['2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31', '2021-09-30', '2021-10-31'], dtype='datetime64[ns]', freq='M')
We assign the βfreq
β parameter the value βM
β which stands for βmonth
β.
We could also change that to hours, so we get an hourly frequency:
>>> pd.date_range(start='01-01-2021', periods=10, freq='H') DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 01:00:00', '2021-01-01 02:00:00', '2021-01-01 03:00:00', '2021-01-01 04:00:00', '2021-01-01 05:00:00', '2021-01-01 06:00:00', '2021-01-01 07:00:00', '2021-01-01 08:00:00', '2021-01-01 09:00:00'], dtype='datetime64[ns]', freq='H')
The term βH
β assigned to the βfreq
β parameter means βhour
β.
Especially when working with stock data or financial data in general, it is often useful to have datasets where only business days (Monday to Friday) play a role. On that account, Pandas provides us with a variation of the date_range()
function, namely the bdate_range()
function which provides date ranges with business days only:
>>> pd.bdate_range(start='01-01-2021', end='15-01-2021') DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-11', '2021-01-12', '2021-01-13', '2021-01-14', '2021-01-15'], dtype='datetime64[ns]', freq='B')
We create a date range again starting from 2021-01-01 to 2021-01-15. Note that some days are missing (for example 2021-01-02). The missing dates are the non-business days.
Resample the date-time values
Consider, we are given the following data series:
>>> series = pd.Series(range(9), index=pd.date_range('01-01-2021', periods=9)) >>> series 2021-01-01 0 2021-01-02 1 2021-01-03 2 2021-01-04 3 2021-01-05 4 2021-01-06 5 2021-01-07 6 2021-01-08 7 2021-01-09 8 Freq: D, dtype: int64
We have several dates and for each date, we get a value. Now, we might wonder: What is the mean value for every three days? Is there a simple way to get that information?
As it turns out: Yes, there is!
>>> series.resample('3D').mean() >>> series 2021-01-01 1.0 2021-01-04 4.0 2021-01-07 7.0 Freq: 3D, dtype: float64
We resample the time series into 3-day episodes and calculate the mean for every three days. The resample()
function expects a string that defines how we want to resample the time-series data. β3Dβ means β3 daysβ. Instead of the mean, we could also calculate the sum here:
>>> series.resample('3D').sum() 2021-01-01 3 2021-01-04 12 2021-01-07 21 Freq: 3D, dtype: int64
The resample()
function is really powerful since it allows us to resample our time data and perform a calculation on that newly structured data with just one line of code.
Inferring the most likely time-frequency
Especially in larger datasets, it might be difficult to tell the frequency of a DatetimeIndex
. The Pandas function infer_freq()
does what its name suggests: it infers the most likely frequency.
Letβs have a look at the following date range:
range = pd.date_range (start='01-01-2021', end='31-01-2021', periods=31)
We set a starting point at 01-01-2021 and an ending point at 31-01-2021 and we set the βperiods
β parameter to 31. Since we start on the first day of the month, end on the last day of the month, and have 31 periods, we might guess that the most likely frequency is daily.
We apply the infer_freq()
function to see if thatβs right:
pd.infer_freq(range) # 'D'
Since βD
β stands for day, we are right!
Working with time deltas
By now, we have seen how to work with specific dates and timestamps. However, sometimes we need to work with absolute time units, for example, "2 days"
, "14 minutes"
etc.
In Pandas, these are called βtime deltasβ. They represent absolute time differences.
And we use the to_timedelta()
function to convert an argument into a βTimedelta
β object.
pd.to_timedelta('5 days, 5 minutes') # Timedelta('5 days 00:05:00')
We assign the to_timedelta()
function the string "5 days, 5 minutes"
and the function transforms this string into a βTimedelta
β object. Note that the "5 minutes"
part was transformed into the format β00:05:00
β.
Alternatively, we can assign the function a list with multiple absolute time units which are then transformed into βTimedelta
β objects:
>>> pd.to_timedelta(['5 days 3 s', '11.3 ms', '3 W']) TimedeltaIndex(['5 days 00:00:03', '0 days 00:00:00.011300', '21 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
So, what do we do with time deltas? As mentioned above, time deltas are absolute differences in time. We could, for example, create a data frame with one column representing dates in a daily frequency and another column with time deltas counting the absolute days:
df2 = pd.DataFrame({'Date': pd.date_range(start='01-01-2021', periods=31, freq='D'), 'Number days': pd.timedelta_range(start='1 day', periods=31)}) # df2:
Date | Number days | |
0 | 2021-01-01 | 1 days |
1 | 2021-01-02 | 2 days |
2 | 2021-01-03 | 3 days |
3 | 2021-01-04 | 4 days |
4 | 2021-01-05 | 5 days |
5 | 2021-01-06 | 6 days |
6 | 2021-01-07 | 7 days |
7 | 2021-01-08 | 8 days |
8 | 2021-01-09 | 9 days |
9 | 2021-01-10 | 10 days |
We create the βDate
β column using the date_range()
function which we introduced in a previous section. For the βNumber days
β column, we apply the timedelta_range()
function. This function works basically the same as the date_range()
function but it does not produce dates or times, but absolute time units such as β2 daysβ
or β3 daysβ
in this case.
Summary
Working with date-time values is a very essential skill since we often find datasets with date and/or time values in them. In this tutorial, we have learned how to apply various date-time functions Pandas provides us with. We learned how to convert arguments into date-time values, how to create several date ranges, how to resample our date values, and how to work with time deltas.
For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter Blog page.
Happy Coding!