The Pandas DataFrame has several Reindexing/Selection/Label Manipulations methods. When applied to a DataFrame, these methods evaluate, modify the elements and return the results.
This is Part 9 of the DataFrame methods series:
Preparation
Before any data manipulation can occur, one (1) new library will require installation:
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
DataFrame idxmax()
The idxmax()
method checks for and returns the index of the first occurrence of the maximum index(es) over a selected axis.
The syntax for this method is as follows:
DataFrame.idxmax(axis=0, skipna=True)
Parameter | Description |
---|---|
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
skipna | If set to True , NaN/NULL values display. |
For this example, the DataFrame for Rivers Clothing depicts their inventory based on available sizes (index
). Running this code will show the maximum (highest) indexes.
Code β Pandas Example
df_inv = pd.DataFrame({'Tops': [22, 12, 19, 8, 23], 'Pants': [5, 7, 17, 19, 12], 'Coats': [11, 18, 1, 16, 3]}, index = ['XS','S', 'M', 'L', 'XL']) result = df_inv.idxmax(axis=0) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df_inv
. - Line [2] retrieves the location(s) of the maximum indexes across the rows. This output saves to the
result
variable. - Line [3] outputs the result to the terminal.
Output
Tops | XL |
Pants | L |
Coats | S |
dtype: object |
For this example, a 5-day series of daytime highs record. This method returns the maximum temperature index.
Code β Series Example
temps = pd.Series(data=[5, 11, 24, 35, 49], index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5']) print(temps.idxmax())
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df_inv
. - Line [2] retrieves the location(s) of the maximum indexes. This output is printed right away.
Output
Day-5
Note: The Numpy version of this method is numpy.argmax
.
DataFrame idxmin()
The idxmin()
method checks for and returns the index of the first occurrence of the minimum index(es) over a selected axis.
The syntax for this method is as follows:
DataFrame.idxmin(axis=0, skipna=True)
Parameter | Description |
---|---|
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
skipna | If set to True , NaN/NULL values display. |
The DataFrame for Rivers Clothing depicts their inventory based on available sizes (indexes).
Running this code will show the minimum (lowest) indexes.
Code β Pandas Example
df_inv = pd.DataFrame({'Tops': [22, 12, 19, 8, 23], 'Pants': [5, 7, 17, 19, 12], 'Coats': [11, 18, 1, 16, 3]}, index = ['XS','S', 'M', 'L', 'XL']) result = df_inv.idxmin(axis=0) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df_inv
. - Line [2] retrieves the location(s) of the minimum indexes across each row. This output saves to the
result
variable. - Line [3] outputs the result to the terminal.
Output
Tops | L |
Pants | XS |
Coats | M |
dtype: object |
For this example, a 5-day series of daytime highs record. This method returns the minimum temperature index.
Code β Series Example
temps = pd.Series(data=[5, None, 24, 35, 49], index=['Day-1', 'Day-2', 'Day-3', 'Day-4', 'Day-5']) print(temps.idxmin())
- Line [1] creates a Series of temperatures and saves it to
temps
. - Line [2] retrieves the location(s) of the maximum indexes across the rows and outputs the result to the terminal.
Output
Day-1
Note: The Numpy version of this method is numpy.argmin
.
DataFrame reindex()
The reindex()
method configures a DataFrame/Series to a new index. This method uses the parameter fill logic to replace the NaN/NULL values occurring from this operation.
The syntax for this method is as follows:
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
Parameter | Description |
---|---|
labels | A list of indexes (label names) to work with the reindexed DataFrame/Series. |
index | See below. |
columns | See below. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
method | Option to use when filling in NaN/NULL values when a reindex occurs. Available options are None , pad /ffill , backfill /bfill , or request . |
copy | If True , reindex on and return a new DataFrame/Series else, return a copy. By default, True |
level | The integer/name of the level if working with MultiIndex . |
fill_value | Fill value to use for NaN/NULL values. |
limit | The maximum number of elements to forward/backward fill. |
tolerance | Maximum distance from original labels and new labels for inexact matches. |
The DataFrame reindex
() method has two (2) calling conventions:
(index=index_labels, columns=column_labels)
(labels, axis={'index', 'columns'})
For this example, Rivers Clothing wants to replace XL with XXS. Running the code below accomplishes this task.
df_inv = pd.DataFrame({'Tops': [22, 12, 19, 8, 23], 'Pants': [5, 7, 17, 19, 12], 'Coats': [11, 18, 1, 16, 3]}, index = ['XS', 'S', 'M', 'L', 'XL']) new_index = ['XXS', 'XS', 'S', 'M', 'L'] result = df_inv.reindex(new_index, fill_value=0) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df_inv
. - Line [2] does the following:
- Sets the new index for the DataFrame (adding in XXS and removing XL).
- Fills the vacant values and replaces them with zeros (0).
- Saves the output to result.
- Line [3] outputs the result to the terminal.
Output
Tops | Pants | Coats | |
XXS | 0 | 0 | 0 |
XS | 22 | 5 | 11 |
S | 12 | 7 | 18 |
M | 19 | 17 | 1 |
L | 8 | 19 | 16 |
DataFrame reindex_like()
The reindex_like()
method returns an object (DataFrame/Series) with matching indexes as another object (DataFrame/Series).
π‘ Note: A new object (DataFrame/Series) creates unless the new index is the same as the current one and the copy parameter is False.
For this example, the DataFrames (df1
& df2
) contain a 4-day/3-day daily forecast of the daytime stats such as Celsius and Fahrenheit and Wind Speed.
df1 = pd.DataFrame([[24, 115, 'extreme'], [31, 87, 'high'], [22, 65, 'medium'], [3, 9, 'low']], columns=['Cel.', 'Fah.', 'Wind'], index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D')) df2 = pd.DataFrame([[8, 'low'], [3, 'low'], [54, 'medium']], columns=['temp_celsius', 'windspeed'], index=pd.DatetimeIndex(['2014-02-12', '2014-02-13', '2014-02-15'])) print(df1) result = df2.reindex_like(df1) print(result)
- Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using
date_range()
and saves it todf1
. - Line [2] creates a DataFrame with Celsius, Fahrenheit, and Wind for three (3) days using
DateTimeIndex()
and saves it todf2
. - Line [3] outputs
df1
to the terminal. - Line [4] performs a
reindex_like()
on the DataFrames and saves it to theresult
variable. - Line [5] outputs the result to the terminal.
Output
df1
Cel. | Fah. | Wind | |
2014-02-12 | 24 | 115 | Extreme |
2014-02-13 | 31 | 87 | High |
2014-02-14 | 22 | 65 | Medium |
2014-02-15 | 3 | 9 | low |
df2
2014-02-12 | 8.0 | NaN | low |
2014-02-13 | 3.0 | NaN | low |
2014-02-14 | NaN | NaN | NaN |
2014-02-15 | 15 | 54.0 | medium |
DataFrame rename()
The rename()
method changes the axis label(s) in a DataFrame/Series.
The syntax for this method is as follows:
DataFrame.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
Parameter | Description |
---|---|
mapper | Dictionary or function transformations to apply to an axis. Use mapper with an axis to specify the axis. |
index | Rather than using the axis, you can set the index(es) to mapper . |
columns | Rather than using the axis, you can set the column(s) to mapper . |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
copy | If set to True , a copy creates. This parameter is True by default. |
inplace | If set to True , the changes apply to the original DataFrame. If False , the changes apply to a new DataFrame. By default, False . |
fill_value | Fill value to use for NaN/NULL values. |
level | If MultiIndex renames it on the selected axis. |
errors | If set to Raise , an error message will display, else ignore it. By default, Ignore . |
For this example, the same 4-day forecast DataFrame used above modifies.
df = pd.DataFrame([[24, 115, 'extreme'], [31, 87, 'high'], [22, 65, 'medium'], [3, 9, 'low']], columns=['Cel.', 'Fah.', 'Wind'], index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D')) result = df.rename(columns={"Cel.": "Celsius", "Fah.": "Fahrenheit"}) print(result)
- Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using
date_range()
and saves it todf1
. - Line [2] renames the columns to those set out in the columns parameter and saves it to the
result
variable. - Line [3] outputs the result to the terminal.
Output
Celsius | Fahrenheit | Wind | |
2014-02-12 | 24 | 115 | Extreme |
2014-02-13 | 31 | 87 | High |
2014-02-14 | 22 | 65 | Medium |
2014-02-15 | 3 | 9 | low |
DataFrame rename_axis()
The rename_axis()
method works the same as rename()
: it sets the name of the axis for the index or columns.
The syntax for this method is as follows:
DataFrame.rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)
Parameter | Description |
---|---|
mapper | The value to set the axis name. |
index | A list, dictionary, or function applied to the selected axis. |
columns | A list, dictionary, or function applied to the selected axis. The columns parameter ignores if the object is a Series. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
copy | If set to True , a copy creates. This parameter is True by default. |
inplace | If set to True , the changes apply to the original DataFrame. If False , the changes apply to a new DataFrame. By default, False . |
For this example, the same 4-day forecast DataFrame as above changes.
df = pd.DataFrame([[24, 115, 'extreme'], [31, 87, 'high'], [22, 65, 'medium'], [3, 9, 'low']], columns=['Cel.', 'Fah.', 'Wind'], index=pd.date_range(start='2014-02-12', end='2014-02-15', freq='D')) result = df.rename_axis("Dates") print(result)
- Line [1] creates a DataFrame with Celsius, Fahrenheit, and Wind for four (4) days, using
date_range()
and saves it todf
. - Line [2] renames the index and saves it to the
result
variable. - Line [3] outputs the result to the terminal.
Output
Dates | Cel. | Fah. | Wind |
2014-02-12 | 24 | 115 | Extreme |
2014-02-13 | 31 | 87 | High |
2014-02-14 | 22 | 65 | Medium |
2014-02-15 | 3 | 9 | low |