Pandas DataFrame Methods: idxmax(), idxmin(), reindex(), reindex_like(), rename(), rename_axis() – Part 9

The Pandas DataFrame has several Reindexing/Selection/Label Manipulations methods. When applied to a DataFrame, these methods evaluate, modify the elements and return the results.

This is Part 9 of the DataFrame methods series:

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
  • Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
  • Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
  • Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
  • Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
  • Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
  • Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
  • Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
  • Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
  • Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
  • Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
  • Part 21 focuses on converting one data type to/from another data type.
  • Part 22 focuses on converting one data type to/from another data type.
  • Part 23 focuses on converting one data type to/from another data type.
  • Part 24 focuses on converting one data type to/from another data type.
  • Part 25 focuses on converting one data type to/from another data type.

Preparation

Before any data manipulation can occur, one (1) new library will require installation:

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

DataFrame idxmax()

The idxmax() method checks for and returns the index of the first occurrence of the maximum index(es) over a selected axis.

https://youtube.com/watch?v=8lY5p3VmGaE

The syntax for this method is as follows:

DataFrame.idxmax(axis=0, skipna=True)
ParameterDescription
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
skipnaIf set to True, NaN/NULL values display.

For this example, the DataFrame for Rivers Clothing depicts their inventory based on available sizes (index). Running this code will show the maximum (highest) indexes.

Code – Pandas Example

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,    7,    17,  19, 12],
                       'Coats':  [11,  18,   1,   16,  3]},
                       index =  ['XS','S', 'M', 'L', 'XL'])

result = df_inv.idxmax(axis=0)
print(result)
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [2] retrieves the location(s) of the maximum indexes across the rows. This output saves to the result variable.
  • Line [3] outputs the result to the terminal.

Output

TopsXL
PantsL
CoatsS
dtype: object

For this example, a 5-day series of daytime highs record. This method returns the maximum temperature index.

Code – Series Example

temps = pd.Series(data=[5, 11, 24, 35, 49],
                  index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5'])
print(temps.idxmax())
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [2] retrieves the location(s) of the maximum indexes. This output is printed right away.

Output

Day-5

Note: The Numpy version of this method is numpy.argmax.


DataFrame idxmin()

The idxmin() method checks for and returns the index of the first occurrence of the minimum index(es) over a selected axis.

https://youtube.com/watch?v=8lY5p3VmGaE

The syntax for this method is as follows:

DataFrame.idxmin(axis=0, skipna=True)
ParameterDescription
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
skipnaIf set to True, NaN/NULL values display.

The DataFrame for Rivers Clothing depicts their inventory based on available sizes (indexes).

Running this code will show the minimum (lowest) indexes.

Code – Pandas Example

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,  7,  17,  19, 12],
                       'Coats':  [11,  18,  1,  16,  3]},
                       index =   ['XS','S', 'M', 'L', 'XL'])

result = df_inv.idxmin(axis=0)
print(result)
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [2] retrieves the location(s) of the minimum indexes across each row. This output saves to the result variable.
  • Line [3] outputs the result to the terminal.

Output

TopsL
PantsXS
CoatsM
dtype: object

For this example, a 5-day series of daytime highs record. This method returns the minimum temperature index.

Code – Series Example

temps = pd.Series(data=[5, None, 24, 35, 49],
                  index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5'])
print(temps.idxmin())
  • Line [1] creates a Series of temperatures and saves it to temps.
  • Line [2] retrieves the location(s) of the maximum indexes across the rows and outputs the result to the terminal.

Output

Day-1

Note: The Numpy version of this method is numpy.argmin.


DataFrame reindex()

The reindex() method configures a DataFrame/Series to a new index. This method uses the parameter fill logic to replace the NaN/NULL values occurring from this operation.

The syntax for this method is as follows:

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, 
                  copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
ParameterDescription
labelsA list of indexes (label names) to work with the reindexed DataFrame/Series.
indexSee below.
columnsSee below.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
methodOption to use when filling in NaN/NULL values when a reindex occurs. Available options are None, pad/ffill, backfill/bfill, or request.
copyIf True, reindex on and return a new DataFrame/Series else, return a copy. By default, True
levelThe integer/name of the level if working with MultiIndex.
fill_valueFill value to use for NaN/NULL values.
limitThe maximum number of elements to forward/backward fill.
toleranceMaximum distance from original labels and new labels for inexact matches.

The DataFrame reindex() method has two (2) calling conventions:

  • (index=index_labels, columns=column_labels)
  • (labels, axis={'index', 'columns'})

For this example, Rivers Clothing wants to replace XL with XXS. Running the code below accomplishes this task.

df_inv = pd.DataFrame({'Tops':   [22, 12,  19,   8, 23],
                       'Pants':  [5,    7,    17,  19, 12],
                       'Coats':  [11,  18,   1,   16,  3]},
                       index =   ['XS', 'S',  'M',  'L',  'XL'])

new_index = ['XXS', 'XS', 'S', 'M', 'L']
result = df_inv.reindex(new_index, fill_value=0)
print(result)
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [2] does the following:
    • Sets the new index for the DataFrame (adding in XXS and removing XL).
    • Fills the vacant values and replaces them with zeros (0).
    • Saves the output to result.
  • Line [3] outputs the result to the terminal.

Output

 TopsPantsCoats
XXS000
XS22511
S12718
M19171
L81916

DataFrame reindex_like()

The reindex_like() method returns an object (DataFrame/Series) with matching indexes as another object (DataFrame/Series).

πŸ’‘ Note: A new object (DataFrame/Series) creates unless the new index is the same as the current one and the copy parameter is False.

For this example, the DataFrames (df1 & df2) contain a 4-day/3-day daily forecast of the daytime stats such as Celsius and Fahrenheit and Wind Speed.

df1 = pd.DataFrame([[24, 115, 'extreme'],
                    [31, 87,  'high'],
                    [22, 65,  'medium'],
                    [3,  9,   'low']],
                   columns=['Cel.', 'Fah.', 'Wind'],
                   index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

df2 = pd.DataFrame([[8,  'low'],
                    [3,  'low'],
                    [54, 'medium']],
                   columns=['temp_celsius', 'windspeed'],
                   index=pd.DatetimeIndex(['2014-02-12', '2014-02-13', '2014-02-15']))

print(df1)
result = df2.reindex_like(df1)
print(result)
  • Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4)  days, using date_range() and saves it to df1.
  • Line [2] creates a DataFrame with Celsius, Fahrenheit, and Wind for three (3) days using DateTimeIndex() and saves it to df2.
  • Line [3] outputs df1 to the terminal.
  • Line [4] performs a reindex_like() on the DataFrames and saves it to the result variable.
  • Line [5] outputs the result to the terminal.

Output

df1
 Cel.Fah.Wind
2014-02-12   24115Extreme
2014-02-13   3187High
2014-02-14   22   65  Medium
2014-02-15    39low
df2
2014-02-12  8.0  NaNlow
2014-02-13  3.0NaNlow
2014-02-14  NaNNaNNaN
2014-02-15 1554.0  medium

DataFrame rename()

The rename() method changes the axis label(s) in a DataFrame/Series.

The syntax for this method is as follows:

DataFrame.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
ParameterDescription
mapperDictionary or function transformations to apply to an axis. Use mapper with an axis to specify the axis.
indexRather than using the axis, you can set the index(es) to mapper.
columnsRather than using the axis, you can set the column(s) to mapper.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
copyIf set to True, a copy creates. This parameter is True by default.
inplaceIf set to True, the changes apply to the original DataFrame. If False, the changes apply to a new DataFrame. By default, False.
fill_valueFill value to use for NaN/NULL values.
levelIf MultiIndex renames it on the selected axis.
errorsIf set to Raise, an error message will display, else ignore it. By default, Ignore.

For this example, the same 4-day forecast DataFrame used above modifies.

df = pd.DataFrame([[24, 115, 'extreme'],
                   [31, 87,  'high'],
                   [22, 65,  'medium'],
                   [3,  9,   'low']],
                  columns=['Cel.', 'Fah.', 'Wind'],
                  index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

result = df.rename(columns={"Cel.": "Celsius", "Fah.": "Fahrenheit"})
print(result)
  • Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using date_range() and saves it to df1.
  • Line [2] renames the columns to those set out in the columns parameter and saves it to the result variable.
  • Line [3] outputs the result to the terminal.

Output

 Celsius Fahrenheit    Wind
2014-02-12   24115Extreme
2014-02-13   3187High
2014-02-14   22   65  Medium
2014-02-15    39low

DataFrame rename_axis()

The rename_axis() method works the same as rename(): it sets the name of the axis for the index or columns.

The syntax for this method is as follows:

DataFrame.rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)
ParameterDescription
mapperThe value to set the axis name.
indexA list, dictionary, or function applied to the selected axis.
columnsA list, dictionary, or function applied to the selected axis. The columns parameter ignores if the object is a Series.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
copyIf set to True, a copy creates. This parameter is True by default.
inplaceIf set to True, the changes apply to the original DataFrame. If False, the changes apply to a new DataFrame. By default, False.

For this example, the same 4-day forecast DataFrame as above changes.

df = pd.DataFrame([[24, 115, 'extreme'],
                   [31, 87,  'high'],
                   [22, 65,  'medium'],
                   [3,  9,   'low']],
                  columns=['Cel.', 'Fah.', 'Wind'],
                  index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D'))

result = df.rename_axis("Dates")
print(result)
  • Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using date_range() and saves it to df.
  • Line [2] renames the index and saves it to the result variable.
  • Line [3] outputs the result to the terminal.

Output

DatesCel.Fah.   Wind
2014-02-12   24115Extreme
2014-02-13   3187High
2014-02-14   22   65  Medium
2014-02-15    39low