5 Best Ways to Find the First Occurrence of One List in Another Using Python

💡 Problem Formulation: How can we determine the index at which a sublist appears for the first time within a larger list in Python? Suppose we have two lists, list_a = [1, 2, 3, 4, 5] and list_b = [3, 4]. We want to find the starting index of list_b in list_a, which in this case is 2.

Method 1: Brute Force Search

This method involves manually scanning the larger list for the sequence of elements found in the sublist. This is akin to sliding the smaller list across the larger one and checking for a match at each position. It’s simple to understand and implement but can be inefficient for large lists due to its O(n*m) complexity, where n is the length of the larger list and m is the length of the smaller list.

Here’s an example:

def find_sublist(sublist, the_list):
    for i in range(len(the_list) - len(sublist) + 1):
        if the_list[i:i+len(sublist)] == sublist:
            return i
    return -1

index = find_sublist([3, 4], [1, 2, 3, 4, 5])
print(index)

Output: 2

The function find_sublist() iterates through the larger list and checks if any slice of the same length as the sublist is equal to the sublist itself. The index at which this occurs is returned, otherwise -1 is returned if the sublist is not found.

Method 2: Using the find() Method with String Conversion

This method involves converting both lists to strings and using the string method find() to locate the first occurrence. This is a quick and simple method for small and simple data structures, but it may not be suitable for lists containing complex objects that cannot be reliably converted to strings.

Here’s an example:

def find_sublist(sublist, the_list):
    str_sub = ' '.join(map(str, sublist))
    str_list = ' '.join(map(str, the_list))
    index = str_list.find(str_sub)
    if index != -1:
        return index // (len(str(sublist[0])) + 1)
    return -1

index = find_sublist([3, 4], [1, 2, 3, 4, 5])
print(index)

Output: 2

This code converts the lists into strings separated by spaces and then uses find() to locate the first occurrence. The function accounts for the space delimiter in the conversion when calculating the index to return.

Method 3: Using Itertools

The itertools library provides efficient tools for iterating through data. By utilizing islice(), we can efficiently compare slices of the larger list with the sublist. This method is more suitable for larger lists or iterators as it does not require computing the length of the list.

Here’s an example:

from itertools import islice

def find_sublist(sublist, the_list):
    sublist_len = len(sublist)
    the_list_slices = (islice(the_list, i, i + sublist_len) for i in range(len(the_list) - sublist_len + 1))
    for i, slice in enumerate(the_list_slices):
        if list(slice) == sublist:
            return i
    return -1

index = find_sublist([3, 4], [1, 2, 3, 4, 5])
print(index)

Output: 2

We create slices of the_list up to the length of the sublist using islice() and check each slice for equality with the sublist. The index is returned when the match is found.

Method 4: Using the KMP algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a more advanced, efficient string-searching method that pre-processes the sublist to build a partial match table, achieving O(n) complexity. This is best suited for situations involving repeated searches or very long lists.

Here’s an example:

# KMP algorithm requires specific preprocessing
# Here's an example to illustrate the general idea:
def kmp_search(sublist, the_list):
    # Implementation of the KMP algorithm
    # Preprocess the sublist to create the partial match table and then search within the_list.
    pass

# Due to the complexity of the algorithm, the implementation is omitted.
# You can find KMP algorithm implementations in various online resources.

As the KMP algorithm is extensive to implement and requires understanding of the particular pattern processing, we have omitted the detailed code here, but implementations are widely available.

Bonus One-Liner Method 5: Built-in Support Using index() and slicing

Using Python’s built-in index() method in a one-liner, we can search for the first occurrence with a combination of slicing and index searching. This method is quick and easy for single occurrences and small lists, but it lacks the efficiency of some of the other methods for repeated searches or very large lists.

Here’s an example:

sublist = [3, 4]
the_list = [1, 2, 3, 4, 5]
index = next((i for i in range(len(the_list)) if the_list[i:i+len(sublist)] == sublist), -1)
print(index)

Output: 2

This one-liner uses a generator expression within the next() function to find the index. It enumerates over the larger list, checking if the sublist exists at each index and immediately returning the index or -1 if not found.

Summary/Discussion

Method 1: Brute Force Search. Simple implementation. May be inefficient for larger lists.
Method 2: Using the find() method with string conversion. Quick for smaller lists. Not suitable for complex object lists.
Method 3: Using Itertools. Efficient for larger lists. Can work with iterators without the need for list-length calculations.
Method 4: Using the KMP algorithm. Highly efficient for large data sets. Requires understanding of the algorithm to implement.
Method 5: Built-in Support Using index() and slicing. Compact one-liner. Not as efficient for large lists or repeated searches.