The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.
This article delves into the methods available for DataFrame Indexing. This article also assumes Excel exists on the computer.
Preparation
Before any data manipulation can occur, one (1) new library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Create a CSV File
For this article, a CSV file is read in. This file contains a small representation of Cars and their associated characteristics. To create this file, perform the following steps:
- Using the mouse, highlight the data below.
- Press
CTRL+C
to copy the contents to the system Clipboard. - Open Excel and click inside Cell
A1
. - Press
CTRL+V
to paste the contents of the system clipboard to the Worksheet. - From the
File
menu, clickSave As
and browse to select the current working directory. - From the
Save As
pop-up, enter the filename and select the file type indicated below. - Click the
Save
button to complete this task.
Data
Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US Citroen DS-21 Pallas;12.0;4;133.0;115.0;3090.;17.5;70;Europe Volkswagen 1131 Deluxe Sedan;26.0;4;97.00;46.00;1835.;20.5;70;Europe Peugeot 504;25.0;4;110.0;87.00;2672.;17.5;70;Europe Audi 100 LS;24.0;4;107.0;90.00;2430.;14.5;70;Europe Saab 99e;25.0;4;104.0;95.00;2375.;17.5;70;Europe BMW 2002;26.0;4;121.0;113.0;2234.;12.5;70;Europe AMC Gremlin;21.0;6;199.0;90.00;2648.;15.0;70;US Ford F250;10.0;8;360.0;215.0;4615.;14.0;70;US
DataFrame head()
If a DataFrame contains many rows, this method comes in handy. This method offers the ability to view a selection of the top n
rows in a DataFrame.
The syntax for this method is as follows:
DataFrame.head(n=5)
Parameter | Description |
---|---|
n | If empty, the first five (5) rows will display. If this value contains an integer, the first n number of rows will display. The number of rows does not include the header row. |
For this example, we will display the top three (3) rows of the Cars DataFrame.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) print(df.head(3))
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] outputs the top three (3) DataFrame rows, plus the header row to the terminal
Output
Car | MPG | Weight | Model | Origin | |
0 | Chevrolet Chevelle Malibu | 18.0 | 3504.0 | 70 | US |
1 | Buick Skylark | 15.0 | 3693.0 | 70 | US |
2 | Plymouth Satellite | 18.0 | 3436.0 | 70 | US |
DataFrame at Property
Use the at
property when a single value (string) for a row/column label pair is required. This property is similar to loc
: both provide label-based lookups.
The syntax for this property is as follows:
DataFrame.at
- This property accepts a row number/column name combination.
- The row number is an integer. The column name is a string.
Code β Example 1
This example retrieves the Car
name located on row 10 of the DataFrame.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.at[10, 'Car'] print(result)
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] retrieves the name in bold below using
at()
. - Line [4] outputs the result to the terminal.
Output
Car | MPG | Weight | Model | Origin | |
10 | Citroen DS-21 Pallas | 12.0 | 3090.0 | 70 | Europe |
Code β Example 2
This example retrieves the Car weight located on row 5 of the DataFrame.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.loc[5].at['Weight'] print(result)
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] retrieves the weight in bold below using
loc()
andat()
. - Line [4] outputs the result to the terminal.
Output
Car | MPG | Weight | Model | Origin | |
5 | Ford Galaxie 500 | 15.0 | 4341.0 | 70 | US |
DataFrame iat Property
Use the iat
property when a single value (integer) for a row/column label pair is required. This property is similar to loc
. Both provide integer-based lookups. If out of range, an IndexError
occurs.
The syntax for this method is as follows:
DataFrame.iat
- This property takes one parameter: a row and column number combination. Both row and column values must be integers.
Code β Example 1
This example retrieves the Car name located on row 11 of the DataFrame.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.iat[11, 0] print(result)
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] retrieves the name in bold below at column 0, row 11 of the DataFrame using
iat()
. - Line [4] outputs the result to the terminal.
Output
Car | MPG | Weight | Model | Origin | |
11 | Volkswagen 1131 Deluxe Sedan | 26.0 | 1835.0 | 70 | Europe |
Code β Example 2
This example retrieves the Car weight located on row 12 of the DataFrame.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.loc[12].iat[2] print(result)
- Line [1-2] explanation of this code is outlined above.
- Line [3] retrieves the weight in bold below at row 12, column 2 of the DataFrame using
loc()
andat()
. - Line [4] outputs the result to the terminal.
Output:
Car | MPG | Weight | Model | Origin | |
12 | Peugot 504 | 25.0 | 2672.0 | 70 | Europe |
DataFrame loc
Use this property to access a group of rows/columns by label(s) or a Boolean array. This property is label-based. A KeyError
occurs if an item is missing.
The syntax for this property is as follows:
DataFrame.loc[input]
Acceptable Inputs
- A single label
- A list or array of labels
- A slice object with labels
- A boolean array
- A boolean series
- An alignable Index. The index of the returned selection will be the input
- A callable function
This example retrieves the Car details from row 8 of the DataFrame using loc()
.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.loc[8] print(result)
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] retrieves the information from line 8 of the DataFrame.
- Line [4] outputs this to the terminal.
Output
Car | Pontiac Catalina |
MPG | 14.0 |
Weight | 4425.0 |
Model | 70 |
Origin | US |
π‘ Note: As noted above, there are numerous ways to retrieve the data. Click here for additional details.
DataFrame iloc
This property is integer-location-based indexing based on selection by position. The position base is 0 to length-1 of the axis. An IndexError
occurs if an index is out-of-bounds, except slice indexers: they allow out-of-bounds indexing.
The syntax for this property is as follows:
DataFrame.iloc[input]
Acceptable Inputs
- An integer
- A list/array of integers
- A slice object with integers
- A boolean array
- A callable function
Code
In this example, we display Cars whose origin is Europe. Below is one way to accomplish this task.
cols = ['Car', 'MPG', 'Weight', 'Model', 'Origin'] df = pd.read_csv('cars.csv', sep=';', usecols=cols) result = df.iloc[10:15] print(result)
- Line [1] creates a list of the DataFrame columns to view.
- Line [2] creates a DataFrame by:
- Reading in the CSV file created earlier.
- Assigning the field separator character from the CSV file: the semi-colon (
;
). - Assigning the view columns to the
usecols
parameter.
- Line [3] retrieves the information from lines 10 to 14 of the DataFrame.
- Line [4] outputs the result to the terminal.
Output
Car | MPG | Weight | Model | Origin | |
10 | Citroen DS-21 Pallas | 12.0 | 3090.0 | 70 | Europe |
11 | 11 Volkswagen 1131 Deluxe Sedan | 26.0 | 1835.0 | 70 | Europe |
12 | Peugeot 504 | 25.0 | 2672.0 | 70 | Europe |
13 | Audi 100 LS | 24.0 | 2430.0 | 70 | Europe |
14 | Saab 99e | 25.0 | 2375.0 | 70 | Europe |