5 Best Ways to Find Unique Values from Multiple Columns in Python Pandas

💡 Problem Formulation: When working with data frames in Python’s Pandas library, it’s common to encounter the need to extract unique values across multiple columns. For instance, if you have a dataframe with two columns representing ‘Product’ and ‘Manufacturer’, you may want to retrieve a list of all unique combinations. The desired output is a list or a dataframe containing only the distinct pairs from the specified columns.

Method 1: Using `drop_duplicates()`

This method involves selecting the relevant columns of the dataframe and then applying the drop_duplicates() function to remove duplicate rows, keeping only the unique combinations of values.

Here’s an example:

import pandas as pd

# Sample dataframe
df = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'],
    'Manufacturer': ['FreshFruits', 'TreeTop', 'FreshFruits', 'CitrusCo', 'TreeTop']
})

# Finding unique values
unique_combinations = df[['Product', 'Manufacturer']].drop_duplicates()
print(unique_combinations)

Output:

    Product Manufacturer
0    Apple  FreshFruits
1   Banana      TreeTop
3   Orange     CitrusCo

In this example, we first define a dataframe with product and manufacturer columns. By selecting these columns and applying drop_duplicates(), we obtain a new dataframe with only the unique (Product, Manufacturer) pairs. This method is straightforward and convenient for smaller datasets or when working with only a few columns.

Method 2: Using `unique()` with `groupby()`

Another approach is to group the dataframe by the columns of interest and then apply the unique() function to extract unique values within each group. This can be especially handy when dealing with multiple groups of values.

Here’s an example:

# Grouping by 'Product' and 'Manufacturer' and getting unique groups
unique_groups = df.groupby(['Product', 'Manufacturer']).size().reset_index().iloc[:, :-1]
print(unique_groups)

Output:

    Product Manufacturer
0    Apple  FreshFruits
1   Banana      TreeTop
2   Orange     CitrusCo

By grouping the dataframe using groupby() with the specified columns and using .size(), Pandas gives us the count of each group. Since we are only interested in the unique groups, we reset the index and discard the count column, resulting in the unique (Product, Manufacturer) pairs. This approach is flexible and can easily extend to more complex situations.

Method 3: Using `value_counts()` Method

Pandas has a value_counts() method that can be used to quickly find unique row combinations by counting occurrences. This method is particularly efficient for obtaining frequency counts alongside the unique values.

Here’s an example:

# Getting value counts of combinations
unique_values_counts = df.value_counts(subset=['Product', 'Manufacturer']).reset_index()
print(unique_values_counts)

Output:

   Product Manufacturer  0
0   Apple  FreshFruits  2
1  Banana      TreeTop  2
2  Orange     CitrusCo  1

Using value_counts() we extract the unique combinations of ‘Product’ and ‘Manufacturer’ columns along with their frequencies. The result is then reset to get a clean dataframe. This method not only gives us the unique values but also counts how many times each combination appears in the dataset, which can be invaluable for data analysis tasks.

Method 4: Using `numpy` Library

If you’re willing to work with arrays, Numpy’s unique() function can be used in conjunction with Pandas to find unique values from multiple columns.

Here’s an example:

import numpy as np

# Converting columns to numpy array
np_array = df[['Product', 'Manufacturer']].to_numpy()

# Using numpy's unique method to get unique rows
unique_rows = np.unique(np_array, axis=0)
print(unique_rows)

Output:

[['Apple' 'FreshFruits']
 ['Banana' 'TreeTop']
 ['Orange' 'CitrusCo']]

In the example, we convert selected dataframe columns to a numpy array and utilize np.unique() with the axis=0 argument to find unique rows. The output is a numpy array of unique product-manufacturer combinations. This method is highly performant for large datasets and takes advantage of Numpy’s optimized array operations.

Bonus One-Liner Method 5: Using `pandas.concat()`

For a quick and dirty one-liner solution, you can concatenate the columns of interest and then apply unique() directly to get all unique values.

Here’s an example:

# One-liner to find unique values by concatenating the columns
unique_one_liner = pd.concat([df['Product'], df['Manufacturer']]).unique()
print(unique_one_liner)

Output:

['Apple' 'Banana' 'Orange' 'FreshFruits' 'TreeTop' 'CitrusCo']

This one-liner concatenates the ‘Product’ and ‘Manufacturer’ columns and applies unique() to find all distinct values across both columns. Simple and concise, this method is best for quickly finding unique values without concern for column association.

Summary/Discussion

Method 1: Using drop_duplicates(). Straightforward and easy to understand. May not be as efficient for very large datasets.
Method 2: Using unique() with groupby(). Offers more control and can deal with more complex groupings. Slightly less straightforward than Method 1.
Method 3: Using value_counts() Method. Efficient and offers frequency information. May provide more information than needed if you only want unique values.
Method 4: Using numpy Library. Highly performant and suggested for larger datasets. Requires additional knowledge of numpy.
Bonus Method 5: Using pandas.concat(). Quick one-liner. Does not maintain the relationship between values from different columns.

Method 1: Using drop_duplicates()

Method 2: Using unique() with groupby()

Method 3: Using value_counts() Method

Method 4: Using numpy Library

Bonus One-Liner Method 5: Using pandas.concat()

Summary/Discussion

Method 1: Using `drop_duplicates()`

Method 2: Using `unique()` with `groupby()`

Method 3: Using `value_counts()` Method

Method 4: Using `numpy` Library

Bonus One-Liner Method 5: Using `pandas.concat()`