Converting integers to categorical data type is a common preprocessing step in data analysis and machine learning workflows. The aim is to transform numerical values into a format that represents different categories or groups to enable proper analysis. For instance, if you have an integer representing a day of the week (1 for Monday, 2 for Tuesday, etc.), you may wish to map these to categorical names (‘Monday’, ‘Tuesday’, etc.) for clarity in your data frames.
Method 1: Using pandas’ astype()
with CategoricalDtype
This method involves creating a categorical data type with predefined categories using pandas’ CategoricalDtype
. It is highly customizable and leverages pandas for efficient data manipulation.
Here’s an example:
import pandas as pd from pandas.api.types import CategoricalDtype # Sample series of integers s = pd.Series([1, 2, 3, 1, 2, 3]) # Define categorical type cat_type = CategoricalDtype(categories=['Monday', 'Tuesday', 'Wednesday'], ordered=True) # Convert series to categorical s_cat = s.astype(cat_type)
Output:
0 Monday 1 Tuesday 2 Wednesday 3 Monday 4 Tuesday 5 Wednesday dtype: category Categories (3, object): [Monday < Tuesday < Wednesday]
This snippet creates a pandas series of integers and maps them to the days of the week using the CategoricalDtype
with specified categories. By using astype()
, the integer series is converted into a categorical series with an order, which is ideal for ordinal data.
Method 2: Using pandas’ map()
Function
Mapping integers to categories directly using pandas’ map()
function is a straightforward and simple approach. It allows for quick lookup substitutions and is highly readable.
Here’s an example:
import pandas as pd # Sample series of integers s = pd.Series([1, 2, 3, 1, 2, 3]) # Create a mapping dictionary days_mapping = {1: 'Monday', 2: 'Tuesday', 3: 'Wednesday'} # Map integers to categories s_mapped = s.map(days_mapping)
Output:
0 Monday 1 Tuesday 2 Wednesday 3 Monday 4 Tuesday 5 Wednesday dtype: object
This code simply maps a pandas series of integers to a new series with the corresponding categorical values based on the provided dictionary. The map()
function is user-friendly for small mappings.
Method 3: Using List Comprehension with a Mapping Dictionary
List comprehension in Python is a compact and efficient way to apply a mapping from integers to categories. It does not require any special libraries, making it lightweight and universally applicable.
Here’s an example:
# Sample list of integers int_list = [1, 2, 3, 1, 2, 3] # Mapping dictionary days_mapping = {1: 'Monday', 2: 'Tuesday', 3: 'Wednesday'} # Convert list using comprehension cat_list = [days_mapping[x] for x in int_list]
Output:
['Monday', 'Tuesday', 'Wednesday', 'Monday', 'Tuesday', 'Wednesday']
Using Python’s list comprehension, we iterate over the list of integers and apply a mapping based on the dictionary. This converts each integer to its categorical equivalent, resulting in a new list of categories.
Method 4: Using pandas’ cut()
Function
The cut()
function in pandas bins values into discrete intervals. This way, you can define the ranges of integer values that correspond to each category, making it great for continuous variable discretization.
Here’s an example:
import pandas as pd # Sample series of integers s = pd.Series([1, 5, 9, 1, 5, 9]) # Defining bins and labels bins = [0, 3, 6, 10] labels = ['Low', 'Medium', 'High'] # Convert to categorical s_binned = pd.cut(s, bins=bins, labels=labels)
Output:
0 Low 1 Medium 2 High 3 Low 4 Medium 5 High dtype: category
This approach uses the concept of binning to transform numerical data into categorical data. It categorizes the integers into ‘Low’, ‘Medium’, and ‘High’ categories based on the defined bins. This is particularly useful for creating ordinal categories.
Bonus One-Liner Method 5: Using a Lambda Function with map()
For quick, in-line transformations without the need for external dictionaries or definitions, a lambda function combined with the map()
method can be a nifty one-liner solution.
Here’s an example:
import pandas as pd # Sample series of integers s = pd.Series([1, 2, 3, 1, 2, 3]) # One-liner conversion s_lambda_mapped = s.map(lambda x: {1: 'Monday', 2: 'Tuesday', 3: 'Wednesday'}.get(x))
Output:
0 Monday 1 Tuesday 2 Wednesday 3 Monday 4 Tuesday 5 Wednesday dtype: object
This elegant one-liner uses map()
with a lambda function that performs the mapping inline. It’s quick and concise but could be less readable when dealing with large maps or more complex logic.
Summary/Discussion
- Method 1: Using pandas’
astype()
withCategoricalDtype
. Customizable and efficient but requires pandas and some setup. - Method 2: Using pandas’
map()
function. Quick and simple but less efficient for larger datasets. - Method 3: List comprehension with mapping dictionary. Lightweight and universal but can become unwieldy with large lists or complex mappings.
- Method 4: Using pandas’
cut()
function. Great for discretizing continuous variables into ordinal categories but not as straightforward for direct integer-to-name mapping. - Bonus Method 5: Lambda function with
map()
. Extremely concise for small mappings but may suffer in readability and is not suitable for complex mappings.