The Pandas DataFrame has several binary operator methods. When applied to a DataFrame, these methods combine two DataFrames and return a new DataFrame with the appropriate result.
This is Part 3 of the following series on Pandas DataFrame operators:
- Part 1: Pandas DataFrame Arithmetic Operators
- Part 2: Pandas DataFrame Reverse Methods
- Part 3: Pandas DataFrame Comparison Operators and Combine
Preparation
Before any data manipulation can occur, one (1) new library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
DataFrame Less Than
The lt()
method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than (<
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.lt(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices less than 45.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.lt(45) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price is less than 45. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | True | True | True | True | True |
1 | True | False | False | True | True |
2 | True | False | False | True | True |
DataFrame Greater Than
The gt()
method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than (>
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.gt(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices that cost more than 25.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.gt(25) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price is greater than 25. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | False | True | False | False | True |
1 | False | True | True | False | False |
2 | False | True | True | False | True |
DataFrame Less Than or Equal To
The le()
method is one of the comparison operators. This method tests each DataFrame element to determine if Less Than or Equal to (<=
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.le(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices less than or equal to 15.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.le(15) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price is less than or equal to 15. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | True | False | False | True | False |
1 | False | False | False | True | False |
2 | False | False | False | False | False |
DataFrame Greater Than or Equal To
The ge()
method is one of the comparison operators. This method tests each DataFrame element to determine if Greater Than or Equal to (>=
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.ge(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices greater than or equal to 35.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.ge(35) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price is greater than or equal to 35. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | False | True | False | False | False |
1 | False | True | True | False | False |
2 | False | True | True | False | True |
DataFrame Not Equal To
The ne()
method is one of the comparison operators. This method tests each DataFrame element to determine if Not Equal to (!=
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.ne(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices not equal to 21.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.ne(21) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price is not equal to 21. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | True | True | False | True | True |
1 | True | True | True | True | False |
2 | True | True | True | True | True |
DataFrame Equal To
The eq()
method is one of the comparison operators. This method tests each DataFrame element to determine if Equal to (==
) the value entered as the first parameter.
This method returns a DataFrame consisting of Boolean values from the comparisons.
The syntax for this method is as follows:
DataFrame.eq(other, axis='columns', level=None)
Parameter | Description |
---|---|
other | This can be any single or multiple element data structure such as a list or list-like object. |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
level | This parameter can be an integer or a label. This parameter is broadcast across a specified level and matches the index values on the MultiIndex level passed. |
fill_value | This parameter fills the NaN values before any computation occurs. If the data in both corresponding locations are missing, the result is missing. |
For this example, we will be using Rivers Clothing to test for item prices equal to 11.
df = pd.DataFrame({'Tops': [15, 20, 25], 'Coats': [36, 88, 89], 'Pants': [21, 56, 94], 'Tanks': [11, 10, 19], 'Sweats': [27, 21, 35]}) result = df.eq(11) print(result)
- Line [1] creates a DataFrame from a Dictionary and saves it to
df
. - Line [2] compares each element and tests to see if the item price equals 11. A
True
/False
value is assigned based on the outcome. - Line [3] outputs the result to the terminal.
Output
Tops | Coats | Pants | Tanks | Sweats | |
0 | False | False | False | True | False |
1 | False | False | False | False | False |
2 | False | False | False | False | False |
DataFrame Combine
The combine()
method takes two (2) DataFrames and merges the data based on the parameter selection(s) chosen.
This method returns a DataFrame consisting of a combination of the parameters provided.
The syntax for this method is as follows:
DataFrame.combine(other, func, fill_value=None, overwrite=True)
Parameter | Description |
---|---|
other | This is the DataFrame to merge column-wise. |
func | This parameter takes two (2) Series as inputs and returns a Series or Scalar. This function merges two (2) DataFrames column-by-column. |
fill_value | This parameter fills the NaN values before passing any column to the Merge Function. |
overwrite | If set to True , any columns in the DataFrames that do not exist in the other will be over-written with NaN values. |
For this example, we have two (2) DataFrames for Rivers Clothing to combine into a single DataFrame.
df1 = pd.DataFrame({'Tops': [2, 5], 'Tanks': [2, 9]}) df2 = pd.DataFrame({'Tops': [3, 10], 'Tanks': [4, 14]}) compact_me = lambda x, y: x if x.sum() > y.sum() else y result = df1.combine(df2, compact_me) print(result)
- Line [1-2] creates two DataFrames and assigns them to
df1
anddf2
. - Line [3] creates a lambda function called
compact_me
that performs calculations on the elements ofdf1
anddf2
. - Line [4] does the following:
- passes the DataFrame
df2
and thecompact_me
function to thecombine
method. - Then saves the output to the
result
variable.
- passes the DataFrame
- Line [5] outputs the result to the terminal.
Output
Tops | Tanks | |
0 | 3 | 4 |
1 | 10 | 14 |
DataFrame Combine First
The combine_first()
method combines two (2) DataFrames by filling NULL values in one DataFrame with NON-NULL values from the other DataFrame. The row/column indexes of the resulting DataFrame will be the union.
This method returns a DataFrame consisting of a combination of the parameters provided.
The syntax for this method is as follows:
DataFrame.combine_first(other)
Parameter | Description |
---|---|
other | This is the DataFrame provided and used to fill NULL values. |
For this example, we have two (2) DataFrames for Rivers Clothing and combine them using the combine_first()
method.
df1 = pd.DataFrame({'Tops': [2, None], 'Tanks': [None, 9]}) df2 = pd.DataFrame({'Tops': [5, 10], 'Tanks': [7, 18]}) result = df1.combine_first(df2) print(result)
- Line [1-2] creates two DataFrames and assigns them to
df1
anddf2
. - Line [3] combines
df2
withdf1
. Notice the values assigned toNone
. - Line [4] outputs the result to the terminal.
Output
Tops | Tanks | |
0 | 2.0 | 7.0 |
1 | 10.0 | 9.0 |