Pandas DataFrame Indexing

Rate this post

The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.  

This article delves into the methods available for DataFrame Indexing. This article also assumes Excel exists on the computer.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Create a CSV File

For this article, a CSV file is read in. This file contains a small representation of Cars and their associated characteristics. To create this file, perform the following steps:

  • Using the mouse, highlight the data below.
  • Press CTRL+C to copy the contents to the system Clipboard.
  • Open Excel and click inside Cell A1.
  • Press CTRL+V to paste the contents of the system clipboard to the Worksheet.
  • From the File menu, click Save As and browse to select the current working directory.
  • From the Save As pop-up, enter the filename and select the file type indicated below.
  • Click the Save button to complete this task.

Data

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US
AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US
Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US
Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US
Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US
Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US
Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US
AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US
Citroen DS-21 Pallas;12.0;4;133.0;115.0;3090.;17.5;70;Europe
Volkswagen 1131 Deluxe Sedan;26.0;4;97.00;46.00;1835.;20.5;70;Europe
Peugeot 504;25.0;4;110.0;87.00;2672.;17.5;70;Europe
Audi 100 LS;24.0;4;107.0;90.00;2430.;14.5;70;Europe
Saab 99e;25.0;4;104.0;95.00;2375.;17.5;70;Europe
BMW 2002;26.0;4;121.0;113.0;2234.;12.5;70;Europe
AMC Gremlin;21.0;6;199.0;90.00;2648.;15.0;70;US
Ford F250;10.0;8;360.0;215.0;4615.;14.0;70;US

DataFrame head()

If a DataFrame contains many rows, this method comes in handy. This method offers the ability to view a selection of the top n rows in a DataFrame.

The syntax for this method is as follows:

DataFrame.head(n=5)
ParameterDescription
nIf empty, the first five (5) rows will display. If this value contains an integer, the first n number of rows will display. The number of rows does not include the header row.

For this example, we will display the top three (3) rows of the Cars DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
print(df.head(3))
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] outputs the top three (3) DataFrame rows, plus the header row to the terminal

Output

 CarMPGWeightModelOrigin
0Chevrolet Chevelle Malibu18.03504.070US
1Buick Skylark15.03693.070US
2Plymouth Satellite18.03436.070US

DataFrame at Property

Use the at property when a single value (string) for a row/column label pair is required. This property is similar to loc: both provide label-based lookups.

The syntax for this property is as follows:

DataFrame.at
  • This property accepts a row number/column name combination.
  • The row number is an integer. The column name is a string.

Code – Example 1

This example retrieves the Car name located on row 10 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.at[10, 'Car']
print(result)
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] retrieves the name in bold below using at().
  • Line [4] outputs the result to the terminal.

Output

 CarMPGWeightModelOrigin
10Citroen DS-21 Pallas12.03090.070Europe

Code – Example 2

This example retrieves the Car weight located on row 5 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[5].at['Weight']
print(result)
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] retrieves the weight in bold below using loc() and at().
  • Line [4] outputs the result to the terminal.

Output

 CarMPGWeightModelOrigin
5Ford Galaxie 50015.04341.070US

DataFrame iat Property

Use the iat property when a single value (integer) for a row/column label pair is required. This property is similar to loc. Both provide integer-based lookups. If out of range, an IndexError occurs.

The syntax for this method is as follows:

DataFrame.iat
  • This property takes one parameter: a row and column number combination. Both row and column values must be integers.

Code – Example 1

This example retrieves the Car name located on row 11 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.iat[11, 0]
print(result)
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] retrieves the name in bold below at column 0, row 11 of the DataFrame using iat().
  • Line [4] outputs the result to the terminal.

Output

 CarMPGWeightModelOrigin
11Volkswagen 1131 Deluxe Sedan26.01835.070Europe

Code – Example 2

This example retrieves the Car weight located on row 12 of the DataFrame.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[12].iat[2]
print(result)
  • Line [1-2] explanation of this code is outlined above.
  • Line [3] retrieves the weight in bold below at row 12, column 2 of the DataFrame using loc() and at().
  • Line [4] outputs the result to the terminal.

Output:

 CarMPGWeightModelOrigin
12Peugot 50425.02672.070Europe

DataFrame loc

Use this property to access a group of rows/columns by label(s) or a Boolean array. This property is label-based. A KeyError occurs if an item is missing.

The syntax for this property is as follows:

DataFrame.loc[input]

Acceptable Inputs

  • A single label
  • A list or array of labels
  • A slice object with labels
  • A boolean array
  • A boolean series
  • An alignable Index. The index of the returned selection will be the input
  • A callable function

This example retrieves the Car details from row 8 of the DataFrame using loc().

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.loc[8]
print(result)
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] retrieves the information from line 8 of the DataFrame.
  • Line [4] outputs this to the terminal.

Output

CarPontiac Catalina
MPG14.0
Weight4425.0
Model70
OriginUS

💡 Note: As noted above, there are numerous ways to retrieve the data. Click here for additional details.


DataFrame iloc

This property is integer-location-based indexing based on selection by position. The position base is 0 to length-1 of the axis. An IndexError occurs if an index is out-of-bounds, except slice indexers: they allow out-of-bounds indexing.

The syntax for this property is as follows:

DataFrame.iloc[input]

Acceptable Inputs

  • An integer
  • A list/array of integers
  • A slice object with integers
  • A boolean array
  • A callable function

Code

In this example, we display Cars whose origin is Europe. Below is one way to accomplish this task.

cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin']
df = pd.read_csv('cars.csv', sep=';', usecols=cols)
result = df.iloc[10:15]
print(result) 
  • Line [1] creates a list of the DataFrame columns to view.
  • Line [2] creates a DataFrame by:
    • Reading in the CSV file created earlier.
    • Assigning the field separator character from the CSV file: the semi-colon (;).
    • Assigning the view columns to the usecols parameter.
  • Line [3] retrieves the information from lines 10 to 14 of the DataFrame.
  • Line [4] outputs the result to the terminal.

Output

 CarMPGWeightModelOrigin
10Citroen DS-21 Pallas 12.03090.070Europe
1111 Volkswagen 1131 Deluxe Sedan 26.01835.070Europe
12Peugeot 50425.02672.070Europe
13Audi 100 LS24.02430.070Europe
14Saab 99e25.02375.070Europe