5 Best Ways to Filter and Sort Rows in Python

πŸ’‘ Problem Formulation: When working with data in Python, a common requirement is to filter and sort rows based on specific criteria. For instance, given a dataset of products with attributes like name, price, and rating, the goal may be to filter out products below a certain rating and then sort the remaining items by price in ascending order. This article explores various methods to achieve such tasks effectively in Python.

Method 1: Using List Comprehensions and the sorted() Function

Python provides an elegant way to filter and sort data using list comprehensions coupled with the built-in sorted() function. List comprehensions offer a concise way to create lists, allowing for conditional filtering. The sorted() function sorts the list based on the specified key and returns a new sorted list.

Here’s an example:

items = [{'name': 'Book', 'price': 10, 'rating': 4.5},
         {'name': 'Pen', 'price': 5, 'rating': 3.2},
         {'name': 'Laptop', 'price': 1000, 'rating': 4.7}]

# Filter and sort
filtered_sorted_items = sorted(
    [item for item in items if item['rating'] >= 4],
    key=lambda x: x['price']
)

print(filtered_sorted_items)

Output:

[{'name': 'Book', 'price': 10, 'rating': 4.5}, {'name': 'Laptop', 'price': 1000, 'rating': 4.7}]

This code snippet filters items that have a rating of 4 or higher and sorts them by their price in ascending order. List comprehensions filter the items, and the sorted() function takes a lambda function as the sorting key.

Method 2: Using the Pandas Library

The Pandas library is powerful for data manipulation and analysis. It provides intuitive methods for filtering and sorting data using DataFrames. The query() method allows filtering, while the sort_values() method sorts the data by a specified column.

Here’s an example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'name': ['Book', 'Pen', 'Laptop'],
    'price': [10, 5, 1000],
    'rating': [4.5, 3.2, 4.7]
})

# Filter and sort
result = df.query('rating >= 4').sort_values('price')

print(result)

Output:

     name  price  rating
0   Book     10     4.5
2  Laptop   1000     4.7

The code uses a DataFrame to represent data and utilizes the query() method to filter rows where the rating is at least 4. It then sorts the remaining data by the price column using the sort_values() method.

Method 3: Using the Filter Function

The built-in filter() function allows for an alternative way to filter data in Python. It applies a function to each item of an iterable (such as a list) and returns an iterator for the items where the function returns True. Subsequent sorting can be applied to the resulting iterator.

Here’s an example:

products = [('Book', 10, 4.5), ('Pen', 5, 3.2), ('Laptop', 1000, 4.7)]

# Filter function
def filter_criteria(product):
    return product[2] >= 4

filtered_products = filter(filter_criteria, products)

# Sort the filtered list
sorted_products = sorted(filtered_products, key=lambda x: x[1])

print(list(sorted_products))

Output:

[('Book', 10, 4.5), ('Laptop', 1000, 4.7)]

This snippet demonstrates the use of the filter() function to select products with a rating of 4 or higher. It then applies the sorted() function to the resulting iterator, sorting by the second element in each tuple, which is the price.

Method 4: Using the SQL-like Queries in Pandas

In addition to the query method demonstrated in Method 2, Pandas supports SQL-like queries using the DataFrame’s query() method. This is a powerful feature for those who are familiar with SQL and prefer a query language approach to filtering and sorting data.

Here’s an example:

import pandas as pd

# Create a DataFrame
data = pd.DataFrame({
    'name': ['Book', 'Pen', 'Laptop'],
    'price': [10, 5, 1000],
    'rating': [4.5, 3.2, 4.7]
})

# SQL-like query to filter and sort
result = data.query("rating >= 4").sort_values(by='price', ascending=True)

print(result)

Output:

     name  price  rating
0   Book     10     4.5
2  Laptop   1000     4.7

This code performs the same operation as in Method 2 but emphasizes the SQL-like syntax in the query string passed to the query() method, highlighting the intuitive nature for users with a background in SQL.

Bonus One-Liner Method 5: Using Lambda Functions

Python’s lambda functions offer a way to write concise, one-liner code for filtering and sorting operations. Lambda functions are anonymous functions that are defined using the lambda keyword and can be used to create compact, in-line sorting and filtering logic.

Here’s an example:

data = [('Book', 10, 4.5), ('Pen', 5, 3.2), ('Laptop', 1000, 4.7)]

# One-liner to filter and sort
result = sorted(filter(lambda x: x[2] >= 4, data), key=lambda x: x[1])

print(result)

Output:

[('Book', 10, 4.5), ('Laptop', 1000, 4.7)]

This example combines the filter() and sorted() functions with lambda expressions to achieve filtering and sorting in a single, expressive line of code.

Summary/Discussion

  • Method 1: List Comprehensions with sorted(). Strengths: Highly readable and expressive, part of Python’s core features. Weaknesses: May not be as efficient with very large datasets.
  • Method 2: Pandas Library Usage. Strengths: Extremely powerful and feature-rich for complex data manipulations. Weaknesses: Requires external library installation and is less memory-efficient for small tasks.
  • Method 3: Filter Function. Strengths: Offers a functional programming approach, which can be more intuitive for some developers. Weaknesses: Less concise than list comprehensions and may be slower due to function call overhead.
  • Method 4: SQL-like Queries in Pandas. Strengths: Familiar syntax for SQL users, leveraging Pandas’ powerful data frame operations. Weaknesses: Overhead of Pandas as an external dependency, might not be suitable for simple tasks.
  • Bonus Method 5: Lambda Functions. Strengths: Ideal for one-liner enthusiasts and small to medium-sized tasks. Weaknesses: Can be harder to read and comprehend, especially for complex filtering and sorting logic.