Efficient Strategies to Sort a List of Dictionaries by Multiple Keys in Python

πŸ’‘ Problem Formulation:

Often in programming, we encounter the need to organize complex data structures. Specifically, in Python, sorting a list of dictionaries by multiple keys is a common task that can prove to be challenging. The goal is to sort the list initially by one key, and then by another, providing a primary and secondary order. For example, given a list of employee records, we might want to sort them first by department (primary key) and then by surname (secondary key).

Method 1: Using the sorted() function with a lambda

An effective approach to sort a list of dictionaries by multiple keys is to use the sorted() function combined with a lambda function to specify the keys. This method allows for flexibility and readability when dealing with complex sorting criteria.

Here’s an example:

employees = [
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'}
]

sorted_employees = sorted(employees, key=lambda x: (x['department'], x['surname']))
print(sorted_employees)

Output:

[
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'}
]

Explanation:

In this code snippet, the sorted() function takes our list of employee dictionaries and sorts them based on the multiple keys defined within the lambda function. The primary key is the ‘department’ upon which the initial sort takes place, followed by a secondary sort based on ‘surname’.

Method 2: Using the sort() method with itemgetter

The sort() method of lists combined with the itemgetter() from the operator module can be used to produce a highly efficient, readable sorting operation for sorting by multiple keys.

Here’s an example:

from operator import itemgetter

employees = [
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'}
]

employees.sort(key=itemgetter('department', 'surname'))
print(employees)

Output:

[
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'}
]

Explanation:

Here, the itemgetter() function creates a function that grabs the ‘department’ and ‘surname’ keys from each dictionary. The list’s in-place sort() method uses this function to sort the list, resulting in the same ordered list as in Method 1.

Method 3: Using attrgetter() with custom objects

For cases where the list of dictionaries might be replaced by a list of objects, using attrgetter() from the operator module allows sorting by multiple attributes of custom objects.

Here’s an example:

from operator import attrgetter

class Employee:
    def __init__(self, name, department, surname):
        self.name = name
        self.department = department
        self.surname = surname
    def __repr__(self):
        return f"Employee({self.name}, {self.department}, {self.surname})"

employees = [
    Employee('John', 'Engineering', 'Doe'),
    Employee('Jane', 'Marketing', 'Smith'),
    Employee('Dave', 'Engineering', 'Jones'),
    Employee('Mike', 'Marketing', 'Avery')
]

sorted_employees = sorted(employees, key=attrgetter('department', 'surname'))
print(sorted_employees)

Output:

[
    Employee(John, Engineering, Doe),
    Employee(Dave, Engineering, Jones),
    Employee(Mike, Marketing, Avery),
    Employee(Jane, Marketing, Smith)
]

Explanation:

The code defines a custom Employee class and sorts a list of its instances. The attrgetter() utility generates a function to extract the specified attributes which is then passed to the sorted() function.

Method 4: Sorting with complex sort conditions

When sorting with non-trivial conditions or needing to apply different criteria for different keys, a custom sort function can be written and passed to the sorted() function.

Here’s an example:

employees = [
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'}
]

def custom_sort(employee):
    return (employee['department'], employee['surname'])

sorted_employees = sorted(employees, key=custom_sort)
print(sorted_employees)

Output:

[
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'}
]

Explanation:

This method introduces a separate function custom_sort() which can encapsulate any complex logic required for sorting. This function is then used with the sorted() function as the key.

Bonus One-Liner Method 5: Chained Sorting

Chained sorting allows sorting by one key after another in descending order of priority. It is a neat trick that utilizes the stability of Python’s sort algorithm.

Here’s an example:

employees = [
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'}
]

# Secondary key sort
employees.sort(key=lambda x: x['surname'])
# Primary key sort
employees.sort(key=lambda x: x['department'])

print(employees)

Output:

[
    {'name': 'John', 'department': 'Engineering', 'surname': 'Doe'},
    {'name': 'Dave', 'department': 'Engineering', 'surname': 'Jones'},
    {'name': 'Mike', 'department': 'Marketing', 'surname': 'Avery'},
    {'name': 'Jane', 'department': 'Marketing', 'surname': 'Smith'}
]

Explanation:

Because Python’s sort is stable (it maintains the relative order of records that compare equal), we can sort the list twice. First, we sort by the secondary key (surname), and then we sort again by the primary key (department). The second sort preserves the order of surnames within each department.

Summary/Discussion

  • Method 1: sorted() function with lambda. Strengths: Quick and readable. Weaknesses: May be less efficient for very large data sets due to lambda function overhead.
  • Method 2: sort() method with itemgetter(). Strengths: Fast and efficient, particularly for large lists. Weaknesses: Might be less intuitive for people unfamiliar with itemgetter().
  • Method 3: attrgetter() with custom objects. Strengths: Ideal for object-oriented code. Weaknesses: Requires the overhead of class creation and may be overkill for simple tasks.
  • Method 4: Custom sort function. Strengths: Highly customizable and can handle complex conditions. Weaknesses: More verbose and potentially less clear.
  • Method 5: Chained sorting. Strengths: Utilizes the stable sort feature and is succinct. Weaknesses: Might be counterintuitive and requires understanding of sort stability.