Pandas DataFrame Methods: idxmax(), idxmin(), reindex(), reindex_like(), rename(), rename_axis() - Part 9

The Pandas DataFrame has several Reindexing/Selection/Label Manipulations methods. When applied to a DataFrame, these methods evaluate, modify the elements and return the results.

This is Part 9 of the DataFrame methods series:

Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
Part 21 focuses on converting one data type to/from another data type.
Part 22 focuses on converting one data type to/from another data type.
Part 23 focuses on converting one data type to/from another data type.
Part 24 focuses on converting one data type to/from another data type.
Part 25 focuses on converting one data type to/from another data type.

Preparation

Before any data manipulation can occur, one (1) new library will require installation:

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to Install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

DataFrame idxmax()

The idxmax() method checks for and returns the index of the first occurrence of the maximum index(es) over a selected axis.

https://youtube.com/watch?v=8lY5p3VmGaE

The syntax for this method is as follows:

DataFrame.idxmax(axis=0, skipna=True)

Parameter	Description
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`skipna`	If set to `True`, NaN/NULL values display.

For this example, the DataFrame for Rivers Clothing depicts their inventory based on available sizes (index). Running this code will show the maximum (highest) indexes.

Code – Pandas Example

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,    7,    17,  19, 12],
                       'Coats':  [11,  18,   1,   16,  3]},
                       index =  ['XS','S', 'M', 'L', 'XL'])

result = df_inv.idxmax(axis=0)
print(result)

Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
Line [2] retrieves the location(s) of the maximum indexes across the rows. This output saves to the result variable.
Line [3] outputs the result to the terminal.

Output

Tops	XL
Pants	L
Coats	S
dtype: object

For this example, a 5-day series of daytime highs record. This method returns the maximum temperature index.

Code – Series Example

temps = pd.Series(data=[5, 11, 24, 35, 49],
                  index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5'])
print(temps.idxmax())

Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
Line [2] retrieves the location(s) of the maximum indexes. This output is printed right away.

Output

Day-5

Note: The Numpy version of this method is numpy.argmax.

DataFrame idxmin()

The idxmin() method checks for and returns the index of the first occurrence of the minimum index(es) over a selected axis.

https://youtube.com/watch?v=8lY5p3VmGaE

The syntax for this method is as follows:

DataFrame.idxmin(axis=0, skipna=True)

Parameter	Description
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`skipna`	If set to `True`, NaN/NULL values display.

The DataFrame for Rivers Clothing depicts their inventory based on available sizes (indexes).

Running this code will show the minimum (lowest) indexes.

Code – Pandas Example

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,  7,  17,  19, 12],
                       'Coats':  [11,  18,  1,  16,  3]},
                       index =   ['XS','S', 'M', 'L', 'XL'])

result = df_inv.idxmin(axis=0)
print(result)

Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
Line [2] retrieves the location(s) of the minimum indexes across each row. This output saves to the result variable.
Line [3] outputs the result to the terminal.

Output

Tops	L
Pants	XS
Coats	M
dtype: object

For this example, a 5-day series of daytime highs record. This method returns the minimum temperature index.

Code – Series Example

temps = pd.Series(data=[5, None, 24, 35, 49],
                  index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5'])
print(temps.idxmin())

Line [1] creates a Series of temperatures and saves it to temps.
Line [2] retrieves the location(s) of the maximum indexes across the rows and outputs the result to the terminal.

Output

Day-1

Note: The Numpy version of this method is numpy.argmin.

DataFrame reindex()

The reindex() method configures a DataFrame/Series to a new index. This method uses the parameter fill logic to replace the NaN/NULL values occurring from this operation.

The syntax for this method is as follows:

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, 
                  copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

Parameter	Description
`labels`	A list of indexes (label names) to work with the reindexed DataFrame/Series.
`index`	See below.
`columns`	See below.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`method`	Option to use when filling in NaN/NULL values when a reindex occurs. Available options are `None`, `pad`/`ffill`, `backfill`/`bfill`, or `request`.
`copy`	If `True`, reindex on and return a new DataFrame/Series else, return a copy. By default, `True`
`level`	The integer/name of the level if working with `MultiIndex`.
`fill_value`	Fill value to use for NaN/NULL values.
`limit`	The maximum number of elements to forward/backward fill.
`tolerance`	Maximum distance from original labels and new labels for inexact matches.

The DataFrame reindex() method has two (2) calling conventions:

(index=index_labels, columns=column_labels)
(labels, axis={'index', 'columns'})

For this example, Rivers Clothing wants to replace XL with XXS. Running the code below accomplishes this task.

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,    7,    17,  19, 12],
                       'Coats':  [11,  18,   1,   16,  3]},
                       index =   ['XS', 'S',  'M',  'L',  'XL'])

new_index = ['XXS', 'XS', 'S', 'M', 'L']
result = df_inv.reindex(new_index, fill_value=0)
print(result)

Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
Line [2] does the following:
- Sets the new index for the DataFrame (adding in XXS and removing XL).
- Fills the vacant values and replaces them with zeros (0).
- Saves the output to result.
Line [3] outputs the result to the terminal.

Output

	Tops	Pants	Coats
XXS	0	0	0
XS	22	5	11
S	12	7	18
M	19	17	1
L	8	19	16

DataFrame reindex_like()

The reindex_like() method returns an object (DataFrame/Series) with matching indexes as another object (DataFrame/Series).

💡 Note: A new object (DataFrame/Series) creates unless the new index is the same as the current one and the copy parameter is False.

For this example, the DataFrames (df1 & df2) contain a 4-day/3-day daily forecast of the daytime stats such as Celsius and Fahrenheit and Wind Speed.

df1 = pd.DataFrame([[24, 115, 'extreme'],
                    [31, 87,  'high'],
                    [22, 65,  'medium'],
                    [3,  9,   'low']],
                   columns=['Cel.', 'Fah.', 'Wind'],
                   index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

df2 = pd.DataFrame([[8,  'low'],
                    [3,  'low'],
                    [54, 'medium']],
                   columns=['temp_celsius', 'windspeed'],
                   index=pd.DatetimeIndex(['2014-02-12', '2014-02-13', '2014-02-15']))

print(df1)
result = df2.reindex_like(df1)
print(result)

Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using date_range() and saves it to df1.
Line [2] creates a DataFrame with Celsius, Fahrenheit, and Wind for three (3) days using DateTimeIndex() and saves it to df2.
Line [3] outputs df1 to the terminal.
Line [4] performs a reindex_like() on the DataFrames and saves it to the result variable.
Line [5] outputs the result to the terminal.

Output

df1

	Cel.	Fah.	Wind
2014-02-12	24	115	Extreme
2014-02-13	31	87	High
2014-02-14	22	65	Medium
2014-02-15	3	9	low

df2

2014-02-12	8.0	NaN	low
2014-02-13	3.0	NaN	low
2014-02-14	NaN	NaN	NaN
2014-02-15	15	54.0	medium

DataFrame rename()

The rename() method changes the axis label(s) in a DataFrame/Series.

The syntax for this method is as follows:

DataFrame.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')

Parameter	Description
`mapper`	Dictionary or function transformations to apply to an axis. Use mapper with an axis to specify the axis.
`index`	Rather than using the axis, you can set the index(es) to `mapper`.
`columns`	Rather than using the axis, you can set the column(s) to `mapper`.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`copy`	If set to `True`, a copy creates. This parameter is `True` by default.
`inplace`	If set to `True`, the changes apply to the original DataFrame. If `False`, the changes apply to a new DataFrame. By default, `False`.
`fill_value`	Fill value to use for NaN/NULL values.
`level`	If `MultiIndex` renames it on the selected axis.
`errors`	If set to `Raise`, an error message will display, else ignore it. By default, `Ignore`.

For this example, the same 4-day forecast DataFrame used above modifies.

df = pd.DataFrame([[24, 115, 'extreme'],
                   [31, 87,  'high'],
                   [22, 65,  'medium'],
                   [3,  9,   'low']],
                  columns=['Cel.', 'Fah.', 'Wind'],
                  index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

result = df.rename(columns={"Cel.": "Celsius", "Fah.": "Fahrenheit"})
print(result)

Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using date_range() and saves it to df1.
Line [2] renames the columns to those set out in the columns parameter and saves it to the result variable.
Line [3] outputs the result to the terminal.

Output

	Celsius	Fahrenheit	Wind
2014-02-12	24	115	Extreme
2014-02-13	31	87	High
2014-02-14	22	65	Medium
2014-02-15	3	9	low

DataFrame rename_axis()

The rename_axis() method works the same as rename(): it sets the name of the axis for the index or columns.

The syntax for this method is as follows:

DataFrame.rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)

Parameter	Description
`mapper`	The value to set the axis name.
`index`	A list, dictionary, or function applied to the selected axis.
`columns`	A list, dictionary, or function applied to the selected axis. The `columns` parameter ignores if the object is a Series.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`copy`	If set to `True`, a copy creates. This parameter is `True` by default.
`inplace`	If set to `True`, the changes apply to the original DataFrame. If `False`, the changes apply to a new DataFrame. By default, `False`.

For this example, the same 4-day forecast DataFrame as above changes.

df = pd.DataFrame([[24, 115, 'extreme'],
                   [31, 87,  'high'],
                   [22, 65,  'medium'],
                   [3,  9,   'low']],
                  columns=['Cel.', 'Fah.', 'Wind'],
                  index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

result = df.rename_axis("Dates")
print(result)

Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using date_range() and saves it to df.
Line [2] renames the index and saves it to the result variable.
Line [3] outputs the result to the terminal.

Output

Dates	Cel.	Fah.	Wind
2014-02-12	24	115	Extreme
2014-02-13	31	87	High
2014-02-14	22	65	Medium
2014-02-15	3	9	low