5 Best Ways to Retrieve Categories from a Pandas CategoricalIndex

πŸ’‘ Problem Formulation: When working with categorical data in pandas, we often need to extract the categories used in a CategoricalIndex. Suppose you have a pandas DataFrame with a categorical column and you want to retrieve the unique categories present in that column. The input is a CategoricalIndex object and the desired output is a list of unique categories.

Method 1: Using the .categories Attribute

This method directly accesses the categories attribute of the CategoricalIndex object, which returns an index of the categories. It’s the most straightforward way to get the list of categories.

Here’s an example:

import pandas as pd

# Create a categorical index
cat_index = pd.CategoricalIndex(['apple', 'banana', 'orange', 'apple', 'banana'])
# Get the categories
categories = cat_index.categories

Output:

Index(['apple', 'banana', 'orange'], dtype='object')

This code snippet creates a CategoricalIndex object and fetches the categories by accessing the .categories attribute. It outputs the unique categories present in the index, which in this case are ‘apple’, ‘banana’, and ‘orange’.

Method 2: Using the unique() Method

The unique() method returns the unique categories in the order they appear. This can be useful when the order of appearance is significant.

Here’s an example:

# Continuing from the previous categorical index
# Get unique categories
unique_categories = cat_index.unique()

Output:

CategoricalIndex(['apple', 'banana', 'orange'], categories=['apple', 'banana', 'orange'], ordered=False, dtype='category')

This code uses the unique() method on the CategoricalIndex to retrieve the unique categories. It respects the order in which they appear in the data structure.

Method 3: Convert to a Series and Use .cat.categories

By converting a CategoricalIndex to a pandas Series, one can utilize the .cat.categories attribute to obtain the categories. This method may be preferred if additional Series functionality is required.

Here’s an example:

# Convert to Series
cat_series = pd.Series(cat_index)
# Get the categories
series_categories = cat_series.cat.categories

Output:

Index(['apple', 'banana', 'orange'], dtype='object')

Here, the CategoricalIndex is first converted to a pandas Series with categorical data. The .cat.categories attribute of the Series is then accessed to retrieve the list of unique categories.

Method 4: Using to_list() Method on categories

If a native Python list is needed, calling the to_list() method on the categories attribute will convert the index to a list. This can be more convenient when working outside of pandas.

Here’s an example:

# Get categories as a list
categories_list = cat_index.categories.to_list()

Output:

['apple', 'banana', 'orange']

The .to_list() method converts the pandas index that contains the categories into a native Python list, making it easy to work with in Python code that’s not specific to pandas.

Bonus One-Liner Method 5: Using List Comprehension

If you prefer a more Pythonic one-liner approach, list comprehension can be used to extract categories from a CategoricalIndex.

Here’s an example:

categories_comp = [category for category in cat_index.categories]

Output:

['apple', 'banana', 'orange']

Using list comprehension, this code quickly iterates over the categories in the CategoricalIndex and builds a native Python list with them.

Summary/Discussion

  • Method 1: Accessing .categories Attribute. The most direct method. Strengths: Simple and intuitive. Weaknesses: Returns a pandas Index object, not a list.
  • Method 2: Using the unique() Method. Respects the order of categories. Strengths: Order sensitive. Weaknesses: Might not be as intuitive as the categories attribute.
  • Method 3: Converting to a Series and Using .cat.categories. Leverages Series functionality. Strengths: Access to Series methods. Weaknesses: Extra step of conversion required.
  • Method 4: Using to_list() Method. Converts categories to a native Python list. Strengths: Native list format. Weaknesses: An additional method call is needed.
  • Bonus Method 5: List Comprehension. A Pythonic one-liner. Strengths: Concise. Weaknesses: Requires understanding of list comprehensions.