5 Best Ways to Merge K Sorted Lists in Python

πŸ’‘ Problem Formulation: Imagine you have multiple sorted lists, and your goal is to combine them into a single sorted list. For example, if your input is [[10, 20], [30], [40, 50, 60]], the desired output would be [10, 20, 30, 40, 50, 60]. This article walks through different methods to achieve that in Python.

Method 1: Using heapq.merge

Python’s heapq module provides a powerful function merge which merges multiple sorted inputs into a single sorted output and returns an iterator over the sorted values. It is efficient and doesn’t require additional space to store the elements.

Here’s an example:

import heapq

lists = [[10, 20], [30], [40, 50, 60]]
merged_list = list(heapq.merge(*lists))
print(merged_list)

Output:

[10, 20, 30, 40, 50, 60]

This code snippet leverages the heapq.merge function, which internally uses a min-heap to efficiently iterate over all lists and yield the smallest elements one at a time, resulting in a merged sorted list without fully sorting the combined list.

Method 2: Using itertools.chain and sorted

The itertools.chain function can be used to concatenate the lists and then sort the combined list using the sorted function. This method is very straightforward and takes just one line, with a clear intent.

Here’s an example:

from itertools import chain

lists = [[10, 20], [30], [40, 50, 60]]
merged_list = sorted(chain(*lists))
print(merged_list)

Output:

[10, 20, 30, 40, 50, 60]

In this example, chain(*lists) combines all the lists into one sequence and sorted() applies a complete sort to return the final merged sorted list. It is simple to understand and write but can be less efficient for large datasets due to the sorting of the entire combined list.

Method 3: Use a Priority Queue

A priority queue or heap queue can be manually managed to merge multiple sorted lists. Each list’s smallest item can be pushed into the queue initially, and as items are popped out, subsequent items from the same list are pushed. This maintains the sorted order.

Here’s an example:

import heapq

lists = [[10, 20], [30], [40, 50, 60]]

pq = []
for l in lists:
    heapq.heappush(pq, (l[0], l))

merged_list = []
while pq:
    val, list_ = heapq.heappop(pq)
    merged_list.append(val)
    if list_[1:]:
        heapq.heappush(pq, (list_[1], list_[1:]))

print(merged_list)

Output:

[10, 20, 30, 40, 50, 60]

In this snippet, each list’s first element along with a reference to the list itself is pushed into a priority queue. When the minimum value is popped from the queue, the next value from that list (if any) is pushed into the queue. This maintains a sort order with efficient insertions and deletions.

Method 4: Merge Sort Inspired Approach

Merge sort is a classic sorting algorithm that can be adapted to merge k sorted lists. This involves pairwise merging of the lists until only one list remains. The advantage of this approach is the utilization of merge sort’s efficient merging process for sorted arrays.

Here’s an example:

def merge_two_lists(l1, l2):
    result = []
    i = j = 0
    while i < len(l1) and j < len(l2):
        if l1[i]  1:
    lists.append(merge_two_lists(lists.pop(0), lists.pop(0)))

merged_list = lists[0]
print(merged_list)

Output:

[10, 20, 30, 40, 50, 60]

The merge_two_lists function is the core of the merge sort algorithm, merging two sorted lists into one. The while loop continuously applies this function to pairs of lists until just one remains. It is particularly effective for medium-sized datasets and keeps memory usage low due to the two-way merging.

Bonus One-Liner Method 5: Using Python’s Built-in Functions

For those who prefer a one-liner solution, Python’s built-in sorted function can be combined with list expansion to concatenate and sort lists in a single expression. It’s elegant and very readable but internally does a full sort, which can be inefficient for large dataset sizes.

Here’s an example:

lists = [[10, 20], [30], [40, 50, 60]]
merged_list = sorted([num for sublist in lists for num in sublist])
print(merged_list)

Output:

[10, 20, 30, 40, 50, 60]

This solution makes use of a list comprehension to flatten the list of lists into a single list, which is then sorted using sorted(). It’s a concise, functional approach, and is easily readable, but as mentioned, may not be the most efficient.

Summary/Discussion

  • Method 1: heapq.merge. Efficient for large datasets. Minimizes memory usage by not combining the lists. Only works well with sorted lists initially.
  • Method 2: itertools.chain with sorted. Very simple and readable. Performs a costly full sort of the combined lists, which can be inefficient with large data.
  • Method 3: Priority Queue. Manages complexity well and maintains sort order with good scalability. Code is more complex than other one-liners.
  • Method 4: Merge Sort Approach. Optimizes the merge process making it fast for medium-sized datasets. Requires pairwise merging which could be slightly less efficient than the heapq merge for large datasets.
  • Method 5: Python Built-ins One-Liner. Elegant and very concise. Unsuitable for very large datasets due to the overhead of a complete sort operation.