Pandas DataFrame Methods

Rate this post

The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.  

This article delves into the methods available for DataFrame Iteration. This article also assumes Excel exists on the computer.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Create a CSV File

For this article, we will create a CSV file. This file contains a small representation of 2020 Fire Incidents in the San Diego area. To complete this task, perform the following steps:

  • Using the mouse, highlight the data below.
  • Press CTRL+C to copy the contents to the system Clipboard.
  • Open Notepad.
  • Press CTRL+V to paste the contents of the system Clipboard to Notepad.
  • From the File menu, click Save As and browse to select the current working directory.
  • From the Save As pop-up, enter the filename and type as indicated below.
  • Click the Save button to complete this task.

Data

Call;Incident;Date;Time;Zip
Life-Threatening Emergency Response;FS21198050;11/30/2021;23:56;92131
Urgent Response;FS21198048;11/30/2021;23:53;92106
Non-Life-Threatening Response;FS21198047;11/30/2021;23:43;92101
Urgent Response;FS21198045;11/30/2021;23:35;92102
Life-Threatening Emergency Response;FS21198046;11/30/2021;23:34;92108
Life-Threatening Emergency Response;FS21198040;11/30/2021;23:26;92110
Non-Life-Threatening Response;FS21198039;11/30/2021;23:24;92037
Life-Threatening Emergency Response;FS21198038;11/30/2021;23:14;92113
Urgent Response;FS21198037;11/30/2021;23:14;92105
Fire;FS21198036;11/30/2021;23:12;92110
 

DataFrame pop()

The pop() method drops the referenced item from the DataFrame. If not located, a KeyError occurs.

The syntax of this method is as follows:

DataFrame.pop(item)
ParameterDescription
itemThis parameter is the Column Label (Name) to be popped (removed).

This example pops (removes) the Date column from the DataFrame.

df = pd.read_csv('fire.csv', sep=';')
df.pop('Date')
print(df)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] pops (removes) the Date column from the DataFrame.
  • Line [3] outputs the DataFrame to the terminal.

Output

 CallIncidentTimeZip
0Life-Threatening Emergency ResponseFS21198050 23:56 92131
1Urgent Response FS21198048 23:53 92106
2Life-Threatening Emergency ResponseFS21198047 23:43 92101
3Urgent Response FS21198045 23:35 92102
4Life-Threatening Emergency ResponseFS21198046 23:34 92108
5Life-Threatening Emergency ResponseFS21198040 23:26 92110
6Non-Life-Threatening Emergency ResponseFS21198039 23:24 92037
7Life-Threatening Emergency ResponseFS21198038 23:14 92113
8Urgent Response FS21198037 23:14 92105
9FireFS21198036 23:1292110

DataFrame tail()

If a DataFrame contains many rows, this method comes in handy. This method offers the ability to view a selection of the bottom n rows of a DataFrame.

The syntax of this method is as follows:

DataFrame.tail(n=5)
ParameterDescription
nIf empty, the bottom five (n=5) rows will display. If this parameter contains a value, the bottom n number of rows will display. The number of rows does not include the header row.

For this example, we will display the bottom three (3) rows of the DataFrame.

df = pd.read_csv('fire.csv', sep=';')
print(df.tail(3))
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] outputs the bottom three (3) DataFrame rows, plus the header row to the terminal.

Output

 CallIncidentDateTimeZip
7Life-Threatening Emergency ResponseFS21198038 11/30/202123:14 92113
8Urgent Response FS21198037 11/30/202123:14 92105
9FireFS21198036 11/30/202123:1292110

DataFrame get()

The get() method gets (retrieves) an item from an object for a specified key. This key (Column) might be a single key (Column) or multiple keys (Columns). A default value of None returns for no match.

The syntax of this method is as follows:

DataFrame.get(key, default=None)
ParameterDescription
keyThis is an object.
valueThis is the same type as the items in the object.

For this example, the values in the Zip column are returned.

df = pd.read_csv('fire.csv', sep=';')
results = df.get("Zip")
print(results)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] saves the Zip values to results.
  • Line [3] outputs the results to the terminal.

Output

092131
192106
292101
392102
492108
592110
692037
792113
892105
992110
Name:Zip, dtype: int64

DataFrame isin()

The isin() method determines if each element in the DataFrame exists in the values parameter. This method returns a DataFrame of Boolean values (True/False) indicating the same.

The syntax of this method is as follows:

DataFrame.isin(values)
ParameterDescription
valuesThis is an iterable and can be a Series, DataFrame, or Dictionary. If the iterable is a Dictionary, the keys must contain and match the Column Labels (Names). If a DataFrame, the index and Column Labels (Names) must match.

This example uses a DataFrame.

df = pd.read_csv('fire.csv', sep=';')
results = df.isin([92108, 92102])
print(results)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] checks for two Zip values and saves the Boolean values to results.
  • Line [3] outputs the results to the terminal.

Output

 CallIncidentTimeZip
0FalseFalseFalseFalse
1FalseFalseFalseFalse
2FalseFalseFalseFalse
3FalseFalseFalseTrue
4FalseFalseFalseTrue
5FalseFalseFalseFalse
6FalseFalseFalseFalse
7FalseFalseFalseFalse
8FalseFalseFalseFalse
9FalseFalseFalseFalse

Click here for examples using a Dictionary and a Series.


DataFrame where()

The where() method replaces values when cond=False. This method is similar to the if-then statement. For each element in the DataFrame, if cond=True, the element value is used. Otherwise, the match from the other value is assigned.

The syntax of this method is as follows:

DataFrame.where(cond, other=nan, inplace=False, axis=None, 
                level=None, errors='raise', try_cast=NoDefault.no_default)
ParameterDescription
condThe allowable items are Boolean, Series, DataFrame, Array, or Callable. Based on the following conditions:
– Where cond=True, the original value is kept.
– Where cond=False, replace with the corresponding value from other (see below).
– If callable, a Boolean Series/DataFrame or Array returns.
otherEntries where cond=False this is changed to the corresponding value from other. If other is callable, this returns a scalar or Series/DataFrame.
inplace– If inplace=True, the DataFrame is updated.
– If inplace=False, the DataFrame is updated, and a copy of the DataFrame returns.
Default is False.
axisDetermines the alignment axis. The default value is None.
levelDetermines the alignment level. The default value is None.
errorsThis parameter does not affect the results.

For this example, this code converts the Zip values that are not 92108 to 90210.

df = pd.read_csv('fire.csv', sep=';')
results = df.where(df == 92108, '90210')
print(results)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] changes all Zip values that do not match the filter criteria and assigns this to results.
  • Line [3] outputs the results to the terminal.

Output

 CallIncidentTimeZip
090210902109021090210
190210902109021090210
290210902109021090210
390210902109021090210
490210902109021092108
590210902109021090210
690210902109021090210
790210902109021090210
890210902109021090210
990210902109021090210

DataFrame mask()

The mask() method replaces values where cond=True. This method is similar to the if-then statement. For each element in the DataFrame, if cond=False, the element value is assigned. Otherwise, the match from the other value is accessed.

The syntax of this method is as follows:

DataFrame.mask(cond, other=nan, inplace=False, axis=None, level=None, 
               errors='raise', try_cast=NoDefault.no_default)
ParameterDescription
condThe allowable items are Boolean, Series, DataFrame, Array, or Callable. Based on the following conditions:
– Where cond=True, the original value is kept.
– Where cond=False, replace with the corresponding value from other (see below).
– If callable, a Boolean Series/DataFrame or Array returns.
otherEntries where cond=False this is changed to the corresponding value from other. If other is callable, this returns a scalar or Series/DataFrame.
inplace– If inplace=True, the DataFrame is updated.
– If inplace=False, the DataFrame is updated, and a copy of the DataFrame returns.
Default is False.
axisDetermines the alignment axis. The default value is None.
levelDetermines the alignment level. The default value is None.
errorsThis parameter does not affect the results.

For this example, the Zip value 92131 changes to 9000.

df = pd.read_csv('fire.csv', sep=';')
results = df.mask(df == 92131, '90000')
print(results)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] searches for and changes the matching field. No other data is modified.
  • Line [3] outputs the results to the terminal.

Output

 CallIncidentDateTimeZip
0Life-Threatening Emergency ResponseFS21198050 11/30/202123:56 90000
1Urgent Response FS21198048 11/30/202123:53 92106
2Life-Threatening Emergency ResponseFS21198047 11/30/202123:43 92101
3Urgent Response FS21198045 11/30/202123:35 92102
4Life-Threatening Emergency ResponseFS21198046 11/30/202123:34 92108
5Life-Threatening Emergency ResponseFS21198040 11/30/202123:26 92110
6Non-Life-Threatening Emergency ResponseFS21198039 11/30/202123:24 92037
7Life-Threatening Emergency ResponseFS21198038 11/30/202123:14 92113
8Urgent Response FS21198037 11/30/202123:14 92105
9FireFS21198036 11/30/202123:1292110

DataFrame query()

The query() method queries the columns of a DataFrame using a boolean expression. The return value is a DataFrame resulting from the provided query expression or None if inplace=True.

The syntax of this method is as follows:

DataFrame.query(expr, inplace=False, **kwargs)
ParameterDescription
exprThis is the query string to be evaluated. You can refer to variables inside the environment by pre-fixing with an @ character. Refer to invalid Python Column Labels (Names) by surrounding them in back-ticks.
inplaceThis value determines if this executes in place and returns a copy
**kwargsClick to see the documentation on eval().

For this example, all Zip values greater than 92108 return.

df = pd.read_csv('fire.csv', sep=';')
results = df.query('Zip > 92108')
print(results)
  • Line [1] reads in the CSV file created earlier. Notice the field separator character is the semi-colon (;).
  • Line [2] searches for and returns the results that match the query entered.
  • Line [3] outputs the results to the terminal.

Output

 CallIncidentDateTimeZip
0Life-Threatening Emergency ResponseFS21198050 11/30/202123:56 92131
5Life-Threatening Emergency ResponseFS21198040 11/30/202123:26 92110
7Life-Threatening Emergency ResponseFS21198038 11/30/202123:14 92113
9FireFS21198036 11/30/202123:1292110

DataFrame xs()

The xs() method takes a Key argument to select data at a particular level of a MultiIndex and returns a cross-section from the Series/DataFrame.

The syntax of this method is as follows:

DataFrame.xs(key, axis=0, level=None, drop_level=True)
ParameterDescription
keyThe label contained in the index or partially in a MultiIndex.
axisAxis to retrieve the cross-section of data from (0/1). Default is 0.
levelIndicates which levels of a MultiIndex to use.
drop_levelIf False, an object returns with the same levels as the original. Default is True.

For this example, we have created a DataFrame containing information about a High School.

school = {'stds_class': 	[14, 13, 18, 21],
	   'avg_class':  	[83, 72, 65, 91],
	   'classes':         	['math', 'science', 'history', 'english'],
	   'bldg':          	['C3', 'B1', 'F6', 'M2'],
	   'teacher':   	['hill', 'watts', 'sufra', 'brown']}

df = pd.DataFrame(school)
df = df.set_index(['classes', 'bldg', 'teacher'])
  • Line [1] creates a new DataFrame, school.
  • Line [2] assigns this DataFrame to df.
  • Line [3] sets the indexes to classes, bldg, and teacher.
  • Line [4] outputs the DataFrame to the terminal.

Output

Below is the DataFrame with no modifications.

                      stds_class  avg_class
classes bldg teacher                       
math    C3   hill             14         83
science B1   watts            13         72
history F6   sufra            18         65
english M2   brown            21         91

The following code applies the xs() method.

df = pd.DataFrame(school)
df = df.set_index(['classes', 'bldg', 'teacher'])
print(df.xs('science'))

Line [3] outputs the details for science to the terminal.

Output

              stds_class  avg_class
bldg teacher                       
B1   watts            13         72