The set_index( ) and reset_index( ) methods are used on top of a Pandas DataFrame to manipulate its index column.
- The method
set_index( )is used to set the index of the DataFrame from the existing columns. - The method
reset_index( )is used to get back to the default index of the dataset.
Pandas set_index example
Let us create a Pandas DataFrame to show a basic example usage of the set_index() method.
Assume that a survey is conducted on various programmers to observe some patterns. The data collected in the survey are;
- What are their names?
- Whatβs their job category asking whether theyβre freelancers or full-time jobholders?
- What is the programming language of their choice at work?
- What is their experience in the number of years?
- Which country do they belong to?
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({
...: "name": ['Chris', 'Priyatham', 'Alice', 'Bob'],
...: "category": ['freelancer', 'freelancer', 'fulltime_job', 'fulltime_job'],
...: "prog_lang": ['Python', 'C', 'Python', 'C'],
...: "exp": [5, 2, 15, 15],
...: "country": ['Germany', 'India', 'France', 'USA']
...: })
In [3]: df
Out[3]:
name category prog_lang exp country
0 Chris freelancer Python 5 Germany
1 Priyatham freelancer C 2 India
2 Alice fulltime_job Python 15 France
3 Bob fulltime_job C 15 USALetβs have a look at the set_index methodβs documentation:
It is observed that set_index( ) is a method on top of DataFrame. There are four major parameters to the set_index( ) method,
- keys
- drop
- append
- inplace
So, if we would like to make the name column of the above DataFrame as the index. It can be done by passing the column name as keys parameter to the set_index( ) method,
In [4]: indexed_df = df.set_index('name')
In [5]: indexed_df
Out[5]:
category prog_lang exp country
name
Chris freelancer Python 5 Germany
Priyatham freelancer C 2 India
Alice fulltime_job Python 15 France
Bob fulltime_job C 15 USAPandas set_index inplace
If you observe the above process of setting index, the set_index method is generating a new DataFrame. Out of the four major parameters, we can use inplace to set the index of the same DataFrame. It’s a boolean value and set to False by default, which needs to be changed to True.
It can be done so by the following code;
In [6]: indexed_df_inplace = df.copy()
In [7]: indexed_df_inplace
Out[7]:
name category prog_lang exp country
0 Chris freelancer Python 5 Germany
1 Priyatham freelancer C 2 India
2 Alice fulltime_job Python 15 France
3 Bob fulltime_job C 15 USA
In [8]: indexed_df_inplace.set_index('name', inplace=True)
In [9]: indexed_df_inplace
Out[9]:
category prog_lang exp country
name
Chris freelancer Python 5 Germany
Priyatham freelancer C 2 India
Alice fulltime_job Python 15 France
Bob fulltime_job C 15 USAYou can see in the above code, indexed_df_inplace DataFrame changed its RangeIndex to normal NamedIndex.
Whenever setting the index using the set_index method, the column of the DataFrame drops and becomes index. It’s because the default value of the drop parameter is set to True. If we would like to keep the column intact, we can change the value of the drop parameter to False.
It can be implemented by the following code:
In [10]: ind_df_inplace_intact.set_index('name', inplace=True, drop=False)
In [11]: ind_df_inplace_intact
Out[11]:
name category prog_lang exp country
name
Chris Chris freelancer Python 5 Germany
Priyatham Priyatham freelancer C 2 India
Alice Alice fulltime_job Python 15 France
Bob Bob fulltime_job C 15 USAFrom the above results, you can observe that the ind_df_inplace_intact DataFrame has name column present in normal columns and as the index.
Pandas reset_index()
Pandas reset_index() method resets the index of a Data Frame to a list of integers ranging from 0 to the length of the data. It takes an integer argument level and a string or a list to select and remove the passed column from the index.