Pandas DataFrame Indexing - Be on the Right Side of Change

The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.

This article delves into the methods available for DataFrame Indexing. This article also assumes Excel exists on the computer.

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to Install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Create a CSV File

For this article, a CSV file is read in. This file contains a small representation of Cars and their associated characteristics. To create this file, perform the following steps:

Using the mouse, highlight the data below.
Press CTRL+C to copy the contents to the system Clipboard.
Open Excel and click inside Cell A1.
Press CTRL+V to paste the contents of the system clipboard to the Worksheet.
From the File menu, click Save As and browse to select the current working directory.
From the Save As pop-up, enter the filename and select the file type indicated below.
Click the Save button to complete this task.

Data

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US
AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US
Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US
Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US
Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US
Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US
Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US
AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US
Citroen DS-21 Pallas;12.0;4;133.0;115.0;3090.;17.5;70;Europe
Volkswagen 1131 Deluxe Sedan;26.0;4;97.00;46.00;1835.;20.5;70;Europe
Peugeot 504;25.0;4;110.0;87.00;2672.;17.5;70;Europe
Audi 100 LS;24.0;4;107.0;90.00;2430.;14.5;70;Europe
Saab 99e;25.0;4;104.0;95.00;2375.;17.5;70;Europe
BMW 2002;26.0;4;121.0;113.0;2234.;12.5;70;Europe
AMC Gremlin;21.0;6;199.0;90.00;2648.;15.0;70;US
Ford F250;10.0;8;360.0;215.0;4615.;14.0;70;US

DataFrame head()

If a DataFrame contains many rows, this method comes in handy. This method offers the ability to view a selection of the top n rows in a DataFrame.

The syntax for this method is as follows:

DataFrame.head(n=5)

Parameter	Description
`n`	If empty, the first five (5) rows will display. If this value contains an integer, the first `n` number of rows will display. The number of rows does not include the header row.

For this example, we will display the top three (3) rows of the Cars DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
print(df.head(3))

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] outputs the top three (3) DataFrame rows, plus the header row to the terminal

Output

	Car	MPG	Weight	Model	Origin
0	Chevrolet Chevelle Malibu	18.0	3504.0	70	US
1	Buick Skylark	15.0	3693.0	70	US
2	Plymouth Satellite	18.0	3436.0	70	US

DataFrame at Property

Use the at property when a single value (string) for a row/column label pair is required. This property is similar to loc: both provide label-based lookups.

The syntax for this property is as follows:

DataFrame.at

This property accepts a row number/column name combination.
The row number is an integer. The column name is a string.

Code – Example 1

This example retrieves the Car name located on row 10 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.at[10, 'Car']
print(result)

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] retrieves the name in bold below using at().
Line [4] outputs the result to the terminal.

Output

	Car	MPG	Weight	Model	Origin
10	Citroen DS-21 Pallas	12.0	3090.0	70	Europe

Code – Example 2

This example retrieves the Car weight located on row 5 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[5].at['Weight']
print(result)

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] retrieves the weight in bold below using loc() and at().
Line [4] outputs the result to the terminal.

Output

	Car	MPG	Weight	Model	Origin
5	Ford Galaxie 500	15.0	4341.0	70	US

DataFrame iat Property

Use the iat property when a single value (integer) for a row/column label pair is required. This property is similar to loc. Both provide integer-based lookups. If out of range, an IndexError occurs.

The syntax for this method is as follows:

DataFrame.iat

This property takes one parameter: a row and column number combination. Both row and column values must be integers.

Code – Example 1

This example retrieves the Car name located on row 11 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.iat[11, 0]
print(result)

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] retrieves the name in bold below at column 0, row 11 of the DataFrame using iat().
Line [4] outputs the result to the terminal.

Output

	Car	MPG	Weight	Model	Origin
11	Volkswagen 1131 Deluxe Sedan	26.0	1835.0	70	Europe

Code – Example 2

This example retrieves the Car weight located on row 12 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[12].iat[2]
print(result)

Line [1-2] explanation of this code is outlined above.
Line [3] retrieves the weight in bold below at row 12, column 2 of the DataFrame using loc() and at().
Line [4] outputs the result to the terminal.

Output:

	Car	MPG	Weight	Model	Origin
12	Peugot 504	25.0	2672.0	70	Europe

DataFrame loc

Use this property to access a group of rows/columns by label(s) or a Boolean array. This property is label-based. A KeyError occurs if an item is missing.

The syntax for this property is as follows:

DataFrame.loc[input]

Acceptable Inputs

A single label
A list or array of labels
A slice object with labels
A boolean array
A boolean series
An alignable Index. The index of the returned selection will be the input
A callable function

This example retrieves the Car details from row 8 of the DataFrame using loc().

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[8]
print(result)

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] retrieves the information from line 8 of the DataFrame.
Line [4] outputs this to the terminal.

Output

Car	Pontiac Catalina
MPG	14.0
Weight	4425.0
Model	70
Origin	US

💡 Note: As noted above, there are numerous ways to retrieve the data. Click here for additional details.

DataFrame iloc

This property is integer-location-based indexing based on selection by position. The position base is 0 to length-1 of the axis. An IndexError occurs if an index is out-of-bounds, except slice indexers: they allow out-of-bounds indexing.

The syntax for this property is as follows:

DataFrame.iloc[input]

Acceptable Inputs

An integer
A list/array of integers
A slice object with integers
A boolean array
A callable function

Code

In this example, we display Cars whose origin is Europe. Below is one way to accomplish this task.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.iloc[10:15]
print(result)

Line [1] creates a list of the DataFrame columns to view.
Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (;).
- Assigning the view columns to the usecols parameter.
Line [3] retrieves the information from lines 10 to 14 of the DataFrame.
Line [4] outputs the result to the terminal.

Output

	Car	MPG	Weight	Model	Origin
10	Citroen DS-21 Pallas	12.0	3090.0	70	Europe
11	11 Volkswagen 1131 Deluxe Sedan	26.0	1835.0	70	Europe
12	Peugeot 504	25.0	2672.0	70	Europe
13	Audi 100 LS	24.0	2430.0	70	Europe
14	Saab 99e	25.0	2375.0	70	Europe