The set_index( )
and reset_index( )
methods are used on top of a Pandas DataFrame to manipulate its index column.
- The method
set_index( )
is used to set the index of the DataFrame from the existing columns. - The method
reset_index( )
is used to get back to the default index of the dataset.
Pandas set_index example
Let us create a Pandas DataFrame to show a basic example usage of the set_index()
method.
Assume that a survey is conducted on various programmers to observe some patterns. The data collected in the survey are;
- What are their names?
- Whatโs their job category asking whether theyโre freelancers or full-time jobholders?
- What is the programming language of their choice at work?
- What is their experience in the number of years?
- Which country do they belong to?
In [1]: import pandas as pd In [2]: df = pd.DataFrame({ ...: "name": ['Chris', 'Priyatham', 'Alice', 'Bob'], ...: "category": ['freelancer', 'freelancer', 'fulltime_job', 'fulltime_job'], ...: "prog_lang": ['Python', 'C', 'Python', 'C'], ...: "exp": [5, 2, 15, 15], ...: "country": ['Germany', 'India', 'France', 'USA'] ...: }) In [3]: df Out[3]: name category prog_lang exp country 0 Chris freelancer Python 5 Germany 1 Priyatham freelancer C 2 India 2 Alice fulltime_job Python 15 France 3 Bob fulltime_job C 15 USA
Letโs have a look at the set_index
methodโs documentation:
It is observed that set_index( )
is a method on top of DataFrame. There are four major parameters to the set_index( )
method,
- keys
- drop
- append
- inplace
So, if we would like to make the name column of the above DataFrame as the index. It can be done by passing the column name as keys parameter to the set_index( )
method,
In [4]: indexed_df = df.set_index('name') In [5]: indexed_df Out[5]: category prog_lang exp country name Chris freelancer Python 5 Germany Priyatham freelancer C 2 India Alice fulltime_job Python 15 France Bob fulltime_job C 15 USA
Pandas set_index inplace
If you observe the above process of setting index, the set_index
method is generating a new DataFrame. Out of the four major parameters, we can use inplace
to set the index of the same DataFrame. It’s a boolean value and set to False
by default, which needs to be changed to True
.
It can be done so by the following code;
In [6]: indexed_df_inplace = df.copy() In [7]: indexed_df_inplace Out[7]: name category prog_lang exp country 0 Chris freelancer Python 5 Germany 1 Priyatham freelancer C 2 India 2 Alice fulltime_job Python 15 France 3 Bob fulltime_job C 15 USA In [8]: indexed_df_inplace.set_index('name', inplace=True) In [9]: indexed_df_inplace Out[9]: category prog_lang exp country name Chris freelancer Python 5 Germany Priyatham freelancer C 2 India Alice fulltime_job Python 15 France Bob fulltime_job C 15 USA
You can see in the above code, indexed_df_inplace
DataFrame changed its RangeIndex
to normal NamedIndex
.
Whenever setting the index using the set_index
method, the column of the DataFrame drops and becomes index. It’s because the default value of the drop
parameter is set to True
. If we would like to keep the column intact, we can change the value of the drop
parameter to False
.
It can be implemented by the following code:
In [10]: ind_df_inplace_intact.set_index('name', inplace=True, drop=False) In [11]: ind_df_inplace_intact Out[11]: name category prog_lang exp country name Chris Chris freelancer Python 5 Germany Priyatham Priyatham freelancer C 2 India Alice Alice fulltime_job Python 15 France Bob Bob fulltime_job C 15 USA
From the above results, you can observe that the ind_df_inplace_intact
DataFrame has name column present in normal columns and as the index.
Pandas reset_index()
Pandas reset_index()
method resets the index of a Data Frame to a list of integers ranging from 0 to the length of the data. It takes an integer argument level
and a string or a list to select and remove the passed column from the index.