Pandas resample(), to_period(), to_timestamp(), tz_localize(), tz_convert()

The Pandas DataFrame/Series has several methods related to time series.

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

💡 Note: The pytz comes packaged with pandas and does not require installation. However, this library is needed for the tz_ localize() and tz_convert() methods to work.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to Install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import pytz

DataFrame resample()

The resample() method is useful for manipulating the frequency and time-series data.

This DataFrame/Series must contain a datetime-like index, for example:

DatetimeIndex,
PeriodIndex,
TimedeltaIndex, or
the class must pass a date-like series/index to the on/level keyword parameter.

The syntax for this method is as follows:

DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)

Parameter	Description
`rule`	This parameter is the offset (string/object) representing a target conversion.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`closed`	This parameter determines which side of the bin interval is closed. Default `'left'` for all frequency offsets except: – `'M', 'A', 'Q', 'BM', 'BA', 'BQ'`, and `'W'`, default `'right'`.
`label`	This parameter determines which bin edge to label bucket. Default `'left'` for all frequency offsets except: – `'Q', 'BM', 'BA', 'BQ'`, and `'W'`, default `'right'`.
`convention`	This parameter is the `PeriodIndex`, and it controls whether to use the start/end of the rule. The available options are: `'start'`, `'end'`, `'s'`, or `'e'`. Default is `'start'`.
`kind`	This parameter is a timestamp/period and is for the `PeriodIndex`.
`loffset`	Not in use since v1.1.0. Add this to `df.index` after `resample()` has taken place.
`base`	Not in use since v1.1.0. Use `'offset'` or `'origin'` instead.
`on`	If a DataFrame, the `datetime` column to use instead of index for resampling.
`level`	A datetime level in a `MultiIndex` scenario to use for resampling.
`origin`	The timestamp to adjust the grouping. The origin time-zone must match the index. If a string, one of the following: `'epoch'`, `'start'`, `'start_day'`, `'end'`, and `'end_day'`
`offset`	This parameter is the offset `timedelta` which adds to the origin.

Rivers Clothing is having a 3-hour blow-out sale for a new line they have introduced, scarfs. This example resamples the sales data and adds up the total number of scarf sales per hour.

df = pd.read_csv('rivers.csv', parse_dates=['date'], index_col=['date'])
print(df)

result = df.resample('1H').sum()
print(result)

Line [1] reads in a CSV file, parses the date column, and sets this column as the index. The output saves to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] resamples the data by grouping the total scarf sales by the hour. The output saves to result.
Line [4] outputs the result to the terminal.

Output

df

	Item	color	sold
date
2022-01-27 08:17:00	scarf	red	3
2022-01-27 08:23:00	scarf	blue	2
2022-01-27 08:47:00	scarf	pink	1
2022-01-27 09:01:00	scarf	black	11
2022-01-27 09:28:00	scarf	brown	6
2022-01-27 09:51:00	scarf	burgundy	15
2022-01-27 10:11:00	scarf	black	21
2022-01-27 10:13:00	scarf	brown	10
2022-01-27 10:22:00	scarf	black	9
2022-01-27 10:28:00	scarf	navy	30

result

	sold
date
2022-01-27 08:00:00	6
2022-01-27 09:00:00	32
2022-01-27 10:00:00	70

DataFrame to_period()

The to_period() method converts a DataFrame/Series from a DatetimeIndex format to a PeriodIndex format based on the selected frequency.

The syntax for this method is as follows:

DataFrame.to_period(freq=None, axis=0, copy=True)

Parameter	Description
`freq`	This parameter is an available frequency of the `PeriodIndex` method.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`copy`	If `True`, the data copies. By default, `True`.

For these examples, we have a list containing datetimes. These datetimes convert to Monthly & Yearly formats.

Code –Monthly Format

idx = pd.to_datetime(['2022-01-15 08:17:00',
                      '2022-01-15 08:23:00',
                      '2022-01-15 08:47:00',
                      '2022-01-15 09:01:00',
                      '2022-01-15 09:28:00'])
print(idx)

result = idx.to_period('M')
print(result)

Line [1] converts a list of strings to a datetime format and saves it to idx.
Line [2] outputs the contents of idx to the terminal.
Line [3] converts the contents of idx to a PeriodIndex Monthly format. The output saves to result.
Line [4] outputs the result to the terminal.

Output

idx
DatetimeIndex(['2022-01-15 08:17:00', '2022-01-15 08:23:00',
               	         '2022-01-15 08:47:00', '2022-01-15 09:01:00',
               	         '2022-01-15 09:28:00'],
              	         dtype='datetime64[ns]', freq=None)

result
PeriodIndex(['2022-01', '2022-01', '2022-01', '2022-01', '2022-01'], dtype='period[M]')

Code –Yearly Example

idx = pd.to_datetime(['2018-01-15 08:17:00',
                      '2019-01-15 08:23:00',
                      '2020-01-15 08:47:00',
                      '2021-01-15 09:01:00',
                      '2022-01-15 09:28:00'])
print(idx)

result = idx.to_period('Y')
print(result)

Line [1] converts a list of strings to a datetime format and saves it to idx.
Line [2] outputs the contents of idx to the terminal.
Line [3] converts the contents of idx to a PeriodIndex Yearly format. The output saves to result.
Line [4] outputs the result to the terminal.

Output

idx
DatetimeIndex(['2018-01-15 08:17:00', '2019-01-15 08:23:00',               
	         '2020-01-15 08:47:00', '2021-01-15 09:01:00',
                            '2022-01-15 09:28:00'],
                            dtype='datetime64[ns]', freq=None)

result
PeriodIndex(['2018', '2019', '2020', '2021', '2022'], dtype='period[A-DEC]')

Note: Definition of frequency period [A-DEC]:

A: year-end
DEC: year ends in December

DataFrame tz_localize()

The tz_localize() method localizes a time zone’s native index of a DataFrame/Series to a targeted time zone.

To view a list of available time zones, click here or navigate to an IDE and run the following code:

$ print(pytz.all_timezones)

The syntax for this method is as follows:

DataFrame.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')

Parameter	Description
`tz`	The parameter is a string of a valid time zone. Ex: `'America/Phoenix'`.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	A datetime level in a `MultiIndex` scenario to use for resampling.
`copy`	If `True`, this parameter makes a copy.
`ambiguous`	If the clock moves backward (counterclockwise), an error may occur. The available options are: – `'infer'`: infer DST-transition hours (based on order). – Boolean-array: `True` reflects as a DST. `False`, set as a non-DST time. – `'NaT'`: returns this value if ambiguous times occur. – The default is `'raise'`.
`nonexistent`	If the clock moves forward (clockwise), an error may occur. The available options are: – `'shift-forward'`: moves the blank/invalid time forward to the nearest non-empty time. – `'shift-backward'`: moves the blank/empty time backward to the nearest non-empty time. – `'NaT'`: returns this value if ambiguous times occur. – `timedelta`: shift empty times by the timedelta. – The default is `'raise'`.

For this example, the time localizes to Berlin, Germany.

tz_1 = pd.date_range('2022-12-25 09:00', periods=3)
print(tz_1)

tz_2 = tz_1.tz_localize(tz='Europe/Berlin')
print(tz_2)

Line [1] creates a date range based on a start date and a 3-day duration. The output saves to tz_1.
Line [2] outputs tz_1 to the terminal.
Line [3] localizes the time zone to Berlin. The output saves to tz_2.
Line [4] outputs tz_2 to the terminal.

Output

tz_1

DatetimeIndex(['2022-12-25 09:00:00', 
               '2022-12-26 09:00:00',              
               '2022-12-27 09:00:00'],              
               dtype='datetime64[ns]', freq='D')

tz_2

DatetimeIndex(['2022-12-25 09:00:00+01:00', 
               '2022-12-26 09:00:00+01:00',               
               '2022-12-27 09:00:00+01:00'],              
               dtype='datetime64[ns, Europe/Berlin]', freq=None)

💡 Note: The localized time zone displays as Europe/Berlin.

DataFrame tz_convert()

The tz_convert() method converts the time zone of a DataFrame to a different time zone.

To view a list of available time zones, click here or navigate to an IDE and run the following code:

$ print(pytz.all_timezones)

The syntax for this method is as follows:

DataFrame.tz_convert(tz, axis=0, level=None, copy=True)

Parameter	Description
`tz`	The parameter is a string of a valid time zone. Ex: `'America/Phoenix'`.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	If `MultiIndex`, specify the conversion level.
`copy`	If `True`, this parameter makes a copy.

A traveler flies from Detroit, Michigan. They are on vacation for a week’s stay in three (3) cities near Berlin.

What is the time difference from their current location to back home in each city visited?

Run this code to find out!

df = pd.DataFrame({'Dest':  ['Berlin', 'Strausberg', 'Bernau'],
                   'Days':  [7, 7, 7]})
  
index_ = pd.date_range('2021-10-09 10:00', periods=3, freq='W', tz='America/Detroit')
df.index = index_
print(df)
print(df.index)

df = df.tz_convert(tz = 'Europe/Berlin')
print(df)
print(df.index)

Line [1] creates a DataFrame from a travel itinerary and saves it to df.
Line [2] creates an index based on a start date, three (3) weeks, a frequency ('W'), and a time zone.
Line [3] sets index_ as the index for the DataFrame.
Line [4] outputs the DataFrame to the terminal.
Line [5] outputs the index of the DataFrame to the terminal.
Line [6] converts the time zone from 'America/Detroit' to 'Europe/Berlin'.
Line [7] outputs the updated DataFrame to the terminal.
Line [8] outputs the updated index to the terminal.

Output – Before tz_convert

df

	Dest	Days
2021-10-10 10:00:00-04:00	Berlin	7
2021-10-17 10:00:00-04:00	Strausberg	7
2021-10-24 10:00:00-04:00	Bernau	7

df-index

DatetimeIndex(['2021-10-10 10:00:00-04:00', 
               '2021-10-17 10:00:00-04:00',
               '2021-10-24 10:00:00-04:00'],
               dtype='datetime64[ns, America/Detroit]', freq='W-SUN')

Output – After tz_convert

df

	Dest	Days
2021-10-10 16:00:00+02:00	Berlin	7
2021-10-17 16:00:00+02:00	Strausberg	7
2021-10-24 16:00:00+02:00	Bernau	7

df-index

DatetimeIndex(['2021-10-10 16:00:00+02:00', 
              '2021-10-17 16:00:00+02:00',
              '2021-10-24 16:00:00+02:00'],
              dtype='datetime64[ns, Europe/Berlin]', freq='W-SUN')

Note: Definition of frequency period [W-SUN]:

W: weekly
SUN: each week starts on a Sunday

DataFrame to_timestamp()

The to_timestamp() method casts (converts) data to a Datetimeindex of timestamps at the start of a selected period.

The syntax for this method is as follows:

DataFrame.to_timestamp(freq=None, how='start', axis=0, copy=True)

Parameter	Description
`freq`	This parameter is an available frequency of the `PeriodIndex` method.
`how`	This parameter is the period conversion to timestamp. The available options are: `'start'`, `'end'`, `'s'`, or `'e'`.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`copy`	If `True`, this parameter makes a copy.

For this example, we have four quarter earnings for Rivers Clothing for 2021. Each row displays a quarter-end date and total earning amount for that time.

earnings = [120545, 230574, 101155, 17598]
the_range = pd.period_range('2021Q1', '2021Q4', freq='Q-DEC')
times = pd.Series(earnings, the_range)
times.index = (the_range.asfreq('M', 'e')).asfreq('H', 's')+8
print(times)

Line [1] saves the quarterly earnings for Rivers Clothing in 2021 to a list.
Line [2] sets the date range (quarterly) and frequency. This output saves to the_range.
Line [3] sets the index and asfreq() month and hour. The start hour for each quarter is 8:00 am.
Line [4] outputs the times variable to the terminal.

Output

times


2021-03-01 08:00	120545
2021-06-01 08:00	230574
2021-09-01 08:00	101155
2021-12-01 08:00	17598
Freq: H, dtype: int64

Further Learning Resources

This is Part 18 of the DataFrame method series.

Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().

Also, have a look at the Pandas DataFrame methods cheat sheet!