Pandas DataFrame Comparison Operators and Combine – Part 3

Rate this post

The Pandas DataFrame has several binary operator methods. When applied to a DataFrame, these methods combine two DataFrames and return a new DataFrame with the appropriate result.

This is Part 3 of the following series on Pandas DataFrame operators:


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

DataFrame Less Than

The lt() method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than (<) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.lt(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices less than 45.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.lt(45)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price is less than 45. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0TrueTrueTrueTrueTrue
1TrueFalseFalseTrueTrue
2TrueFalseFalseTrueTrue

DataFrame Greater Than

The gt() method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than (>) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.gt(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices that cost more than 25.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.gt(25)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price is greater than 25. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0FalseTrueFalseFalseTrue
1FalseTrueTrueFalseFalse
2FalseTrueTrueFalseTrue

DataFrame Less Than or Equal To

The le() method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than or Equal to (<=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.le(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices less than or equal to 15.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.le(15)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price is less than or equal to 15. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0TrueFalseFalseTrueFalse
1FalseFalseFalseTrueFalse
2FalseFalseFalseFalseFalse

DataFrame Greater Than or Equal To

The ge() method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than or Equal to (>=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.ge(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices greater than or equal to 35.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.ge(35)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price is greater than or equal to 35. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0FalseTrueFalseFalseFalse
1FalseTrueTrueFalseFalse
2FalseTrueTrueFalseTrue

DataFrame Not Equal To

The ne() method is one of the comparison operators. This method tests each DataFrame element to determine if Not Equal to (!=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.ne(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices not equal to 21.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.ne(21)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price is not equal to 21. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0TrueTrueFalseTrueTrue
1TrueTrueTrueTrueFalse
2TrueTrueTrueTrueTrue

DataFrame Equal To

The eq() method is one of the comparison operators. This method tests each DataFrame element to determine if Equal to (==) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.eq(other, axis='columns', level=None)
ParameterDescription
otherThis can be any single or multiple element data structure such as a list or list-like object.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
levelThis parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
fill_valueThis parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices equal to 11.

df = pd.DataFrame({'Tops':     [15, 20, 25],
                   'Coats':    [36, 88, 89],
                   'Pants':    [21, 56, 94],
                   'Tanks':    [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.eq(11)
print(result)
  • Line [1] creates a DataFrame from a Dictionary and saves it to df.
  • Line [2] compares each element and tests to see if the item price equals 11. A True/False value is assigned based on the outcome.
  • Line [3] outputs the result to the terminal.

Output

 TopsCoatsPantsTanksSweats
0FalseFalseFalseTrueFalse
1FalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalse

DataFrame Combine

The combine() method takes two (2) DataFrames and merges the data based on the parameter selection(s) chosen.

This method returns a DataFrame consisting of a combination of the parameters provided.

The syntax for this method is as follows:

DataFrame.combine(other, func, fill_value=None, overwrite=True)
ParameterDescription
otherThis is the DataFrame to merge column-wise.
funcThis parameter takes two (2) Series as inputs and returns a Series or Scalar. This function merges two (2) DataFrames column-by-column.
fill_valueThis parameter fills the NaN values before passing any column to the Merge Function.
overwriteIf set to True, any columns in the DataFrames that do not exist in the other will be over-written with NaN values.

For this example, we have two (2) DataFrames for Rivers Clothing to combine into a single DataFrame.

df1 = pd.DataFrame({'Tops':  [2, 5], 
                    'Tanks': [2, 9]})
df2 = pd.DataFrame({'Tops':  [3, 10], 
                    'Tanks': [4, 14]})

compact_me = lambda x, y: x if x.sum() > y.sum() else y
result = df1.combine(df2, compact_me)
print(result)
  • Line [1-2] creates two DataFrames and assigns them to df1 and df2.
  • Line [3] creates a lambda function called compact_me that performs calculations on the elements of df1 and df2.
  • Line [4] does the following:
    • passes the DataFrame df2 and the compact_me function to the combine method.
    • Then saves the output to the result variable.
  • Line [5] outputs the result to the terminal.

Output

 TopsTanks
034
11014

DataFrame Combine First

The combine_first() method combines two (2) DataFrames by filling NULL values in one DataFrame with NON-NULL values from the other DataFrame. The row/column indexes of the resulting DataFrame will be the union.

This method returns a DataFrame consisting of a combination of the parameters provided.

The syntax for this method is as follows:

DataFrame.combine_first(other)
ParameterDescription
otherThis is the DataFrame provided and used to fill NULL values.

For this example, we have two (2) DataFrames for Rivers Clothing and combine them using the combine_first() method.

df1 = pd.DataFrame({'Tops':  [2, None], 
                    'Tanks': [None, 9]})
df2 = pd.DataFrame({'Tops':  [5, 10], 
                    'Tanks': [7, 18]})

result = df1.combine_first(df2)
print(result)
  • Line [1-2] creates two DataFrames and assigns them to df1 and df2.
  • Line [3] combines df2 with df1. Notice the values assigned to None.
  • Line [4] outputs the result to the terminal.

Output

 TopsTanks
02.07.0
110.09.0