The Pandas map( ) function is used to map each value from a Series object to another value using a dictionary/function/Series. It is a convenience function to map values of a Series from one domain to another domain.
Pandas map function
Let’s have a look at the documentation of the map function,
♥️ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month
- map is a Series method – operated on top of a Series object.
In the above, pandas.Series.map takes one major argument, “arg”.
As mentioned in the parameters above, there are 3 different types of possible placeholders for “arg”. In simple they are;
- A Dictionary
- A Function
- An Indexed Series
We’ll explore each of the above argument types in detail. You can use anyone based upon your use-case.
Let’s create a DataFrame that we can use further in the tutorial to explore the map function. The data we have is information about 4 persons;
>>> import pandas as pd
>>> df = pd.DataFrame(
... {
... 'Name': ['Edward', 'Natalie', 'Chris M', 'Priyatham'],
... 'Sex' : ['M', 'F', 'M', 'M'],
... 'Age': [45, 35, 29, 26],
... 'weight(kgs)': [68.4, 58.2, 64.3, 53.1]
... }
... )
>>>df
Name Sex Age weight(kgs)
0 Edward M 45 68.4
1 Natalie F 35 58.2
2 Chris M M 29 64.3
3 Priyatham M 26 53.1Pandas map dictionary to column
Each column in the DataFrame is of Series type. So, we can map a dictionary to a column in the DataFrame because the map is a Series method.
From the possible different types of arguments to the map function mentioned above, let’s use the dictionary type in this section. In Machine Learning, the data we provide to create models is always in numerical form. If you observe the “Sex” column’s dtype in the DataFrame below, it’s of String (object) type.
>>> df['Sex'] 0 M 1 F 2 M 3 M Name: Sex, dtype: object
All values of the “Sex” column values are one of the two discrete values – “M” or “F”. “M” representing Male and “F” representing Female. We can’t provide this column to build a Machine Learning model as it’s not of numerical type. So, the use-case is to convert this column to a numerical type. This kind of data is called “Categorical data” in Machine Learning terminology.
We shall use the map function with a dictionary argument to convert the “Sex” column to a numerical data type. This process of converting Categorical data to numerical data is referred to as “Encoding”. As we have only 2 categories this encoding process is called as “Binary Encoding”.
The code for it is,
>>> df['Sex'].map({'F':1, 'M':0})
0 0
1 1
2 0
3 0
Name: Sex, dtype: int64If you observe the above resultant Series, ‘M’ is mapped to 0 and ‘F’ is mapped to 1 in correspondence to the dictionary.
The above process of mapping using a dictionary can be visualised through the following animated video,
Pandas map function to column
From the possible different types of arguments to the map function mentioned above, let’s use the “Function” type in this section. Let’s achieve the same results of the above dictionary mapping using a Python function.
We need to create a function for it at first. The function should take all values in the “Sex” column one by one and convert them to respective integers.
>>> def sexInt(category): ... if category=='M': ... return 0 ... else: ... return 1
Now let’s use the above function to map it to the “Sex” column.
The code for it is,
>>> df['Sex'].map(sexInt) 0 0 1 1 2 0 3 0 Name: Sex, dtype: int64
The above result is the same as the result of using the dictionary argument. We can check it by comparison;
>>> df['Sex'].map({'M':0, 'F':1}) == df['Sex'].map(sexInt)
0 True
1 True
2 True
3 True
Name: Sex, dtype: boolFrom the above result, you can see that both results are equal.
The above process of mapping using a function can be visualised through the following animated video,
Pandas map Series to column values
From the possible different types of arguments to the map function mentioned above, let’s use the “Indexed Series” type in this section. The people in our DataFrame are ready to provide their nicknames to us. Assume that the nicknames are provided in a Series object. We would like to map our “Name” column of the DataFrame to the nicknames. The condition is;
- The index of the nicknames (called) Series should be equal to the “Name” (caller) column values.
Let’s construct the nicknames column below with the above condition,
>>> nick_Name = pd.Series(['Ed', 'Nat', 'Chris', 'Priyatham'], index=df['Name']) >>> nick_Name Name Edward Ed Natalie Nat Chris M Chris Priyatham Priyatham dtype: object
Let’s map the above created Series to the “Name” column of the Datarame;
The code for it is,
>>> df['Name'].map(nick_Name) 0 Ed 1 Nat 2 Chris 3 Priyatham Name: Name, dtype: object
- The major point of observation in applying the map function is – the index of the resultant Series index is equal to the caller index. This is important because we can add the resultant Series to DataFrame as a column.
Let’s add the resultant Series as a “nick_Name” column to the DataFrame,
>>> df['nick_Name'] = df['Name'].map(nick_Name)
>>> df
Name Sex Age weight(kgs) nick_Name
0 Edward M 45 68.4 Ed
1 Natalie F 35 58.2 Nat
2 Chris M M 29 64.3 Chris
3 Priyatham M 26 53.1 PriyathamThe above process of mapping using an indexed Series can be visualised through the following animated video,
Pandas map multiple columns
Every single column in a DataFrame is a Series and the map is a Series method. So, we have seen only mapping a single column in the above sections using the Pandas map function. But there are hacks in Pandas to make the map function work for multiple columns. Multiple columns combined together form a DataFrame. There is a process called stacking in Pandas. “Stacking” creates a Series of Series (columns) from a DataFrame. Here, all the columns of DataFrame are stacked as Series to form another Series.
We have encoded the “M” and “F” values to 0 and 1 in the previous section. When building Machine Learning models, there are chances where 1 is interpreted as greater than 0 in doing calculations. But, here they are 2 different categories and are not comparable.
So, let’s store the data in a different way in our DataFrame. Let’s dedicate separate columns for male (“M”) and female (“F”). And, we can fill in “Yes” and “No” for a person based upon their gender. This introduces the redundancy of the data but solves our discussed problem above.
It can be done so by the following code,
>>> df['Male'] = ['Yes', 'No', 'Yes', 'Yes']
>>> df['Female'] = ['No', 'Yes', 'No', 'No']
>>> df
Name Sex Age weight(kgs) nick_Name Male Female
0 Edward M 45 68.4 Ed Yes No
1 Natalie F 35 58.2 Nat No Yes
2 Chris M M 29 64.3 Chris Yes No
3 Priyatham M 26 53.1 Priyatham Yes NoNow, we shall map the 2 columns “Male” and “Female” to numerical values. To do so, we should take the subset of the DataFrame.
>>> df_subset = df[['Male', 'Female']] >>> df_subset Male Female 0 Yes No 1 No Yes 2 Yes No 3 Yes No
You can observe that we have a DataFrame of two columns above. The main point to note is both of the columns have the same set of possible values.
Thereafter, we will use the stacking hack and map two columns to the numerical values. This can be implemented using the following code,
>>> df_subset.stack()
0 Male Yes
Female No
1 Male No
Female Yes
2 Male Yes
Female No
3 Male Yes
Female No
dtype: object
>>> df_subset.stack().map({'Yes':1, 'No':0})
0 Male 1
Female 0
1 Male 0
Female 1
2 Male 1
Female 0
3 Male 1
Female 0
dtype: int64
>>> df_subset.stack().map({'Yes':1, 'No':0}).unstack()
Male Female
0 1 0
1 0 1
2 1 0
3 1 0If you observe the above code and results, the DataFrame is first stacked to form a Series. Then the map method is applied to the stacked Series. FInally unstacking it results in, numerical values replaced DataFrame.
In Machine Learning, there are routines to convert a categorical variable column to multiple discrete numerical columns. Such a process of encoding is termed as One-Hot Encoding in Machine Learning terminology.
Pandas map vs apply
We have discussed Pandas apply function in detail in another tutorial. The map and apply functions have some major differences between them. They are;
- The first difference is;
mapis only a Series method.applyis both the Series and DataFrame method.
- The second difference is;
maptakes dict / Series / function as an argumentapplytakes the only function as an argument
- The third difference is;
mapis an element-wise operation on Seriesapplyis used for complex element-wise operations on Series and DataFrame
- The fourth difference is;
mapis used majorly to map values using a dictionaryapplyis used for applying functions that are not available as vectorized aggregation routines on DataFrames
Conclusion and Next Steps
A map function is used majorly to map values of a Series using a dictionary. Whenever you find any categorical data, you can think of a map method to convert them to numerical values. If you liked this tutorial on the map( ) function and like quiz-based learning, please consider giving it a try to read our Coffee Break Pandas book.