5 Best Ways to Format Pandas Period Object and Display Year without Century

Rate this post

πŸ’‘ Problem Formulation: When working with time series data in Python using pandas, you may encounter a need to format a Period object to display just the year, omitting the century. For instance, you have a period object representing 2005 and you wish to display it as '05. This article provides several methods to achieve this result efficiently within the pandas framework.

Method 1: Using strftime with Custom Formatting

This method involves using the strftime() method to format a datetime-like object. The strftime() method allows you to specify the string format of a Period object, which, when given the appropriate format code, can exclude the century from the year.

Here’s an example:

import pandas as pd

# Create a Period object
period = pd.Period('2005', freq='A')

# Format the period to display year without century
formatted_period = period.strftime('%y')
print(formatted_period)

Output:

'05'

This code creates a Period object representing the year 2005. It then formats the period using strftime() with the format code '%y', which represents a two-digit year, thus displaying the year without the century.

Method 2: Direct String Slicing

If you know that the string representation of the Period object will always follow a consistent format, string slicing is a straightforward way to extract the last two digits of the year. This method is quick and does not require additional methods or functions.

Here’s an example:

import pandas as pd

# Create a Period object
period = pd.Period('2005', freq='A')

# Extract the last two digits to display year without century
year_without_century = str(period)[-2:]
print(year_without_century)

Output:

'05'

In this example, we convert the Period object to a string and then use slicing to get the last two characters, which correspond to the year without the century.

Method 3: Using Period.strftime with Lambda Functions

For more flexibility or when dealing with a Series of Periods, you can use the strftime() method within a lambda function to specify custom formatting on a pandas Series. The lambda function gives the ability to apply this method on an element-by-element basis.

Here’s an example:

import pandas as pd

# Create a Series of Period objects
series_of_periods = pd.Series(pd.period_range('2000', periods=3, freq='A'))

# Format each period in the Series to display the year without century
formatted_series = series_of_periods.apply(lambda x: x.strftime('%y'))
print(formatted_series)

Output:

0    00
1    01
2    02
dtype: object

Here we create a Series of Period objects and apply a lambda function that uses strftime() to format each Period to show only the last two digits of the year.

Method 4: Using Period Index and Formatted Strings (f-strings)

This method involves creating a PeriodIndex object and iterating over it with a for loop to format each Period object as a formatted string (also known as an f-string), which can manipulate strings more easily.

Here’s an example:

import pandas as pd

# Create a PeriodIndex object
period_index = pd.period_range('2000', periods=3, freq='A')

# Create a formatted list to display year without century
formatted_list = [f'{year:%y}' for year in period_index]
print(formatted_list)

Output:

['00', '01', '02']

The code snippet creates a PeriodIndex object and then a list comprehension with an f-string that includes the format directive for a two-digit year, extracting the year without the century for each Period object in the index.

Bonus One-Liner Method 5: Using Vectorized String Operations

With the newer versions of pandas, you can use vectorized string operations directly on a Series of Period objects to accomplish the same task in a very concise manner. This method can be more performant as it applies the operation on the entire Series at once.

Here’s an example:

import pandas as pd

# Create a Series of Period objects
series_of_periods = pd.Series(pd.period_range('2000', periods=3, freq='A'))

# Format whole Series to display years without century
formatted_series = series_of_periods.astype(str).str[-2:]
print(formatted_series)

Output:

0    00
1    01
2    02
dtype: object

We utilize pandas’ ability to conduct vectorized string operations by converting the Series of Period objects to strings and then applying string slicing to the entire Series, effectively modifying all the entries to display only the two last digits of the year.

Summary/Discussion

  • Method 1: Using strftime with Custom Formatting. This method is very clear and straightforward. Strength: High readability and standard approach. Weakness: May not be the most efficient for large data sets.
  • Method 2: Direct String Slicing. Very simple and does not require any pandas-specific methods. Strength: simplicity and speed. Weakness: Assumes a consistent string format and length.
  • Method 3: Using Period.strftime with Lambda Functions. Provides the flexibility of applying custom formatting to a Series of elements. Strength: Good for element-wise custom operations. Weakness: Slightly more complex and potentially less efficient than vectorized solutions.
  • Method 4: Using Period Index and Formatted Strings. Combines the intuitiveness of f-strings with the efficiency of list comprehensions. Strength: Pythonic and readable. Weakness: May require additional steps if working with Series instead of PeriodIndex.
  • Bonus Method 5: Using Vectorized String Operations. Leverages pandas’ optimized string methods for high performance. Strength: Speed and efficiency on large data sets. Weakness: Relies on specific pandas versions that support vectorized string operations on Period objects.