5 Best Ways to Merge K Sorted Lists in Python - Be on the Right Side of Change

💡 Problem Formulation: In this article, we address the task of combining multiple sorted lists into a single sorted output list in Python. For example, if we have input lists [1, 4, 5], [1, 3, 4], and [2, 6], our desired output would be the merged sorted list [1, 1, 2, 3, 4, 4, 5, 6]. This is a common problem in coding interviews and data processing applications.

Method 1: Brute Force Merge and Sort

This brute force approach combines all the lists into one list and then sorts it. While this is a straightforward method, it is not the most efficient due to the additional cost of sorting the combined list. It employs Python’s built-in sorted() function that provides a time complexity of O(n log n), where n is the total number of elements across all lists.

Here’s an example:

def merge_k_sorted_lists(lists):
    merged = []
    for lst in lists:
        merged.extend(lst)
    return sorted(merged)

list1 = [1, 4, 5]
list2 = [1, 3, 4]
list3 = [2, 6]
print(merge_k_sorted_lists([list1, list2, list3]))

Output:

[1, 1, 2, 3, 4, 4, 5, 6]

This method concatenates all elements from the k lists into one list and applies Python’s sorted() function to return a single, sorted list. The simplicity of this method makes it a good quick and dirty solution, but the sorting step can be redundant and inefficient for large datasets.

Method 2: Use the heapq Merge Function

The Python heapq module provides a merge() function that efficiently merges multiple sorted inputs into a single sorted iterator. This method is much more efficient than the brute force approach for large datasets as it does not require merging all lists into one before sorting. The function operates with a time complexity of O(n) due to the underlying use of heap data structures.

Here’s an example:

import heapq

list1 = [1, 4, 5]
list2 = [1, 3, 4]
list3 = [2, 6]
merged_iter = heapq.merge(list1, list2, list3)
print(list(merged_iter))

Output:

[1, 1, 2, 3, 4, 4, 5, 6]

By using heapq.merge(), we are able to combine sorted lists using an algorithmic approach that maintains order throughout the merge process. This method is suitable for iterating over merged elements without needing the full list in memory, ideal for large datasets.

Method 3: Priority Queue Based Merge

This method leverages a min heap, or priority queue, which always pops the smallest element out. In Python, the heapq module can be used to turn a list into a heap. We insert the first element of each list along with the index of the list into the heap. Then, we repeatedly pop the smallest element from the heap and add the next element from its corresponding list until all lists are exhausted.

Here’s an example:

import heapq 

def merge_k_sorted_lists(lists):
    min_heap = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(min_heap, (lst[0], i, 0))
    
    result = []
    while min_heap:
        val, list_idx, element_idx = heapq.heappop(min_heap)
        result.append(val)
        if element_idx + 1 < len(lists[list_idx]):
            next_tuple = (lists[list_idx][element_idx + 1], list_idx, element_idx + 1)
            heapq.heappush(min_heap, next_tuple)
    return result

list1 = [1, 4, 5]
list2 = [1, 3, 4]
list3 = [2, 6]
print(merge_k_sorted_lists([list1, list2, list3]))

Output:

[1, 1, 2, 3, 4, 4, 5, 6]

The priority queue-based merge method is a more efficient approach for merging k sorted lists. Every insert and removal from the min heap is done in logarithmic time relative to the number of lists, making it a preferable choice for performance-sensitive applications.

Method 4: Divide and Conquer

The divide and conquer method splits the problem into smaller subproblems of merging two lists at a time. By recursively splitting the list of lists until they are individual lists, we can merge pairs of lists efficiently. This strategy has a good balance between time complexity and ease of understanding, making it widely applicable.

Here’s an example:

def merge_two_sorted_lists(l1, l2):
    result = []
    i = j = 0
    while i < len(l1) and j < len(l2):
        if l1[i]  1:
        merged_lists = []
        for i in range(0, len(lists), 2):
            l1 = lists[i]
            l2 = lists[i + 1] if (i + 1) < len(lists) else []
            merged_lists.append(merge_two_sorted_lists(l1, l2))
        lists = merged_lists
    return lists[0]

list1 = [1, 4, 5]
list2 = [1, 3, 4]
list3 = [2, 6]
print(merge_k_sorted_lists([list1, list2, list3]))

Output:

[1, 1, 2, 3, 4, 4, 5, 6]

The divide and conquer method efficiently reduces the problem size at each step by merging pairs of lists. This method improves on space complexity as it doesn’t need additional space for the elements, just for the recursive calls.

Bonus One-Liner Method 5: Using the itertools chain Function

This is a concise one-liner solution that utilizes the itertools.chain() to combine the lists and then sorts the result. It is similar to the brute force approach but more elegant in its syntax. The chain() function is used for treating consecutive sequences as a single sequence.

Here’s an example:

from itertools import chain

list1 = [1, 4, 5]
list2 = [1, 3, 4]
list3 = [2, 6]
merged_sorted = sorted(chain(list1, list2, list3))
print(merged_sorted)

Output:

[1, 1, 2, 3, 4, 4, 5, 6]

The itertools chain-based method succinctly concatenates the input lists and sorts the combined elements. This solution is very expressive and Pythonic, but it still retains the brute force method’s disadvantage of sorting a combined list, making it less efficient for larger datasets or when the number of lists is high.

Summary/Discussion

Method 1: Brute Force Merge and Sort. Easy to understand and implement. Not as efficient for very large datasets.
Method 2: Use the heapq Merge Function. More efficient for large datasets. Requires understanding of the heapq module.
Method 3: Priority Queue Based Merge. Efficient in terms of both time and space complexities. More complex to implement.
Method 4: Divide and Conquer. Balances time complexity and ease of understanding. Great for moderate-sized datasets.
Bonus Method 5: Using the itertools chain Function. Most Pythonic and concise. Not as efficient with larger datasets.