Pandas DataFrame Comparison Operators and Combine - Part 3

The Pandas DataFrame has several binary operator methods. When applied to a DataFrame, these methods combine two DataFrames and return a new DataFrame with the appropriate result.

This is Part 3 of the following series on Pandas DataFrame operators:

Part 1: Pandas DataFrame Arithmetic Operators
Part 2: Panda s DataFrame Reverse Methods
Part 3: Pandas DataFrame Comparison Operators and Combine

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

DataFrame Less Than

The lt() method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than (<) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.lt(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices less than 45.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.lt(45)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price is less than 45. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	True	True	True	True	True
1	True	False	False	True	True
2	True	False	False	True	True

DataFrame Greater Than

The gt() method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than (>) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.gt(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices that cost more than 25.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.gt(25)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price is greater than 25. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	False	True	False	False	True
1	False	True	True	False	False
2	False	True	True	False	True

DataFrame Less Than or Equal To

The le() method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than or Equal to (<=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.le(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices less than or equal to 15.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.le(15)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price is less than or equal to 15. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	True	False	False	True	False
1	False	False	False	True	False
2	False	False	False	False	False

DataFrame Greater Than or Equal To

The ge() method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than or Equal to (>=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.ge(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices greater than or equal to 35.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.ge(35)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price is greater than or equal to 35. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	False	True	False	False	False
1	False	True	True	False	False
2	False	True	True	False	True

DataFrame Not Equal To

The ne() method is one of the comparison operators. This method tests each DataFrame element to determine if Not Equal to (!=) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.ne(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices not equal to 21.

df = pd.DataFrame({'Tops':    [15, 20, 25],
                   'Coats':   [36, 88, 89],
                   'Pants':   [21, 56, 94],
                   'Tanks':   [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.ne(21)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price is not equal to 21. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	True	True	False	True	True
1	True	True	True	True	False
2	True	True	True	True	True

DataFrame Equal To

The eq() method is one of the comparison operators. This method tests each DataFrame element to determine if Equal to (==) the value entered as the first parameter.

This method returns a DataFrame consisting of Boolean values from the comparisons.

The syntax for this method is as follows:

DataFrame.eq(other, axis='columns', level=None)

Parameter	Description
`other`	This can be any single or multiple element data structure such as a list or list-like object.
`axis`	If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row.
`level`	This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed.
`fill_value`	This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing.

For this example, we will be using Rivers Clothing to test for item prices equal to 11.

df = pd.DataFrame({'Tops':     [15, 20, 25],
                   'Coats':    [36, 88, 89],
                   'Pants':    [21, 56, 94],
                   'Tanks':    [11, 10, 19],
                   'Sweats':  [27, 21, 35]})
result = df.eq(11)
print(result)

Line [1] creates a DataFrame from a Dictionary and saves it to df.
Line [2] compares each element and tests to see if the item price equals 11. A True/False value is assigned based on the outcome.
Line [3] outputs the result to the terminal.

Output

	Tops	Coats	Pants	Tanks	Sweats
0	False	False	False	True	False
1	False	False	False	False	False
2	False	False	False	False	False

DataFrame Combine

The combine() method takes two (2) DataFrames and merges the data based on the parameter selection(s) chosen.

This method returns a DataFrame consisting of a combination of the parameters provided.

The syntax for this method is as follows:

DataFrame.combine(other, func, fill_value=None, overwrite=True)

Parameter	Description
`other`	This is the DataFrame to merge column-wise.
`func`	This parameter takes two (2) Series as inputs and returns a Series or Scalar. This function merges two (2) DataFrames column-by-column.
`fill_value`	This parameter fills the NaN values before passing any column to the Merge Function.
`overwrite`	If set to `True`, any columns in the DataFrames that do not exist in the other will be over-written with NaN values.

For this example, we have two (2) DataFrames for Rivers Clothing to combine into a single DataFrame.

df1 = pd.DataFrame({'Tops':  [2, 5], 
                    'Tanks': [2, 9]})
df2 = pd.DataFrame({'Tops':  [3, 10], 
                    'Tanks': [4, 14]})

compact_me = lambda x, y: x if x.sum() > y.sum() else y
result = df1.combine(df2, compact_me)
print(result)

Line [1-2] creates two DataFrames and assigns them to df1 and df2.
Line [3] creates a lambda function called compact_me that performs calculations on the elements of df1 and df2.
Line [4] does the following:
- passes the DataFrame df2 and the compact_me function to the combine method.
- Then saves the output to the result variable.
Line [5] outputs the result to the terminal.

Output

	Tops	Tanks
0	3	4
1	10	14

DataFrame Combine First

The combine_first() method combines two (2) DataFrames by filling NULL values in one DataFrame with NON-NULL values from the other DataFrame. The row/column indexes of the resulting DataFrame will be the union.

This method returns a DataFrame consisting of a combination of the parameters provided.

The syntax for this method is as follows:

DataFrame.combine_first(other)

Parameter	Description
`other`	This is the DataFrame provided and used to fill NULL values.

For this example, we have two (2) DataFrames for Rivers Clothing and combine them using the combine_first() method.

df1 = pd.DataFrame({'Tops':  [2, None], 
                    'Tanks': [None, 9]})
df2 = pd.DataFrame({'Tops':  [5, 10], 
                    'Tanks': [7, 18]})

result = df1.combine_first(df2)
print(result)

Line [1-2] creates two DataFrames and assigns them to df1 and df2.
Line [3] combines df2 with df1. Notice the values assigned to None.
Line [4] outputs the result to the terminal.

Output

	Tops	Tanks
0	2.0	7.0
1	10.0	9.0