5 Best Ways to Find Common Rows Between Two Pandas DataFrames

💡 Problem Formulation: When working with datasets in Python’s pandas library, you often need to identify common rows between two DataFrames. Whether for data validation, analysis, or merging purposes, finding these intersecting rows is a vital task. For instance, if DataFrame A represents customers from one month and DataFrame B from the following, finding common rows could identify repeat customers. The objective here is to demonstrate methods to efficiently obtain rows present in both DataFrames.

Method 1: Merge with Inner Join

This method uses the merge() function to perform an inner join on DataFrames. An inner join combines rows from both DataFrames wherever there are matching values in specified columns. With pandas, you can specify the key columns to join on and retain only the rows that appear in both DataFrames, acting as the intersection.

Here’s an example:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [3, 2, 1], 'B': [6, 4, 8]})
common_rows = df1.merge(df2, how='inner', on=['A', 'B'])

print(common_rows)

Output:

This code snippet represents an inner join merge, one of the most straightforward ways to find common rows between two DataFrames. The merge() function is called on df1, specifying df2 as the DataFrame to join with, 'inner' as the method of joining, and the columns ['A', 'B'] as the keys for the join. The result is a new DataFrame containing only the rows that have matching values in both original DataFrames.

Method 2: Using `concat()` with `drop_duplicates()`

The concatenation approach with concat() combines multiple DataFrames along a particular axis. By concatenating DataFrames vertically and dropping duplicates, the result contains only the rows that are common between them, effectively modeling an ‘inner join’ in terms of intersection.

Here’s an example:

common_rows = pd.concat([df1, df2]).drop_duplicates(keep=False)

print(common_rows)

Output:

Empty DataFrame
Columns: [A, B]
Index: []

In this example, the concat() function stacks df1 and df2 on top of each other, and drop_duplicates(keep=False) removes all instances of duplicate rows, implying that only unique rows with no matches in either DataFrame remain. Therefore, this method is not returning the intersection and seems mistaken for this use case. We need to adjust our approach if we want to find the common rows instead.

Method 3: Using Indicator in Merge

By setting the indicator=True parameter in the merge() function, an additional column is added to the output DataFrame. This column shows which rows are from both DataFrames (the intersection), those only present in the left DataFrame, or those only in the right one. Filtering on the ‘both’ indicator provides the common rows.

Here’s an example:

common_rows = df1.merge(df2, how='outer', indicator=True).query('_merge == "both"').drop(columns=['_merge'])

print(common_rows)

Output:

The snippet implements an ‘outer’ merge between df1 and df2 while adding the merge indicator column. The query() method is then used to filter out only the rows labeled ‘both’, which indicates they are present in both DataFrames. Finally, the drop() method removes the indicator column yielding a DataFrame with the common rows.

Method 4: Using `isin()` Function

The isin() function in pandas allows you to filter data based on a list of values. By converting one DataFrame into a list of tuples representing rows, and filtering the other DataFrame for rows containing those tuples, you can extract the common rows directly.

Here’s an example:

tuples = [tuple(x) for x in df2.values]
common_rows = df1[df1.apply(tuple, axis=1).isin(tuples)]

print(common_rows)

Output:

This code converts df2 into a list of row tuples, then df1 rows are converted to tuples which are filtered by checking if they are in df2‘s tuple list. The apply(tuple, axis=1) portion converts each row of df1 into a tuple to match the format of tuples.

Bonus One-Liner Method 5: Using `DataFrame.equals()` Function

When you’re interested in checking if two DataFrames are identical rather than finding common rows, the equals() function provides a one-liner solution. This method checks if all rows are identical in both DataFrames and returns a boolean result.

Here’s an example:

are_identical = df1.equals(df2)

print(are_identical)

Output:

False

This single line of code uses the equals() function to directly compare df1 and df2 for equality. It returns False because the rows in the DataFrames are in a different order. It’s important to note that while this verifies data equality, it does not extract common rows.

Summary/Discussion

Method 1: Merge with Inner Join. It is the most direct and simple approach for finding common rows based on merging DataFrames. However, it requires exact column matches and might not be suitable for large DataFrames due to potential memory issues.
Method 2: Using concat() with drop_duplicates(). This method is typically used for different purposes and does not serve to find common rows as initially intended, but it can be used to filter unique rows across DataFrames.
Method 3: Using Indicator in Merge. This approach is useful for identifying which DataFrame each row originates from. It’s excellent for complex merging tasks, but slightly more verbose.
Method 4: Using isin() Function. This is a versatile method that allows for filtering DataFrame rows based on various conditions. Best for situations where the comparison is based on the content of the DataFrame rather than on the index or column labels.
Bonus Method 5: Using equals() Function. Ideal for asserting the equality of two DataFrames, but not for extracting common rows. Most effective as a quick check rather than a method of analysis.

Method 1: Merge with Inner Join

Method 2: Using concat() with drop_duplicates()

Method 3: Using Indicator in Merge

Method 4: Using isin() Function

Bonus One-Liner Method 5: Using DataFrame.equals() Function

Summary/Discussion

Method 2: Using `concat()` with `drop_duplicates()`

Method 4: Using `isin()` Function

Bonus One-Liner Method 5: Using `DataFrame.equals()` Function