5 Best Ways to Perform Word Break II in Python - Be on the Right Side of Change

💡 Problem Formulation: The ‘Word Break II’ problem in Python involves taking a given string and a dictionary of word candidates, then breaking the string into all possible unique word sequences that exist within the dictionary. For example, given the string “catsanddog” and a dictionary [“cat”, “cats”, “and”, “sand”, “dog”], the output should be a list of strings such as [“cats and dog”, “cat sand dog”].

Method 1: Dynamic Programming with Backtracking

The Dynamic Programming with Backtracking method involves using a memoization dictionary to store intermediate results and a recursive function that backtracks through the string to construct all possible word combinations based on the provided dictionary. It is an efficient way of solving ‘Word Break II’ for small to medium-sized strings.

Here’s an example:

def word_break(s, word_dict):
    return _word_break_recur(s, word_dict, {})

def _word_break_recur(s, word_dict, memo):
    if s in memo:
        return memo[s]
    if not s:
        return [""]
    res = []
    for word in word_dict:
        if s.startswith(word):
            for sub_s in _word_break_recur(s[len(word):], word_dict, memo):
                res.append(word + ' ' + sub_s) if sub_s else res.append(word)
    memo[s] = res
    return res

s = "catsanddog"
word_dict = ["cat", "cats", "and", "sand", "dog"]
print(word_break(s, word_dict))

Output:

["cats and dog", "cat sand dog"]

This code snippet defines a function word_break with memoization to store computed values and avoid redundant work during recursion. It checks all the prefixes of the string that are present in the dictionary and recursively processes the suffix. The result of each recursive call is added to the current prefix to form sentences that are stored and returned.

Method 2: Iterative with Queue

Using an iterative approach with a queue, we can explore all possible sentence combinations in a breadth-first manner. This method keeps track of the start index of the remaining string and constructs sentence combinations iteratively which are then checked against the dictionary.

Here’s an example:

from collections import deque

def word_break(s, word_dict):
    word_set = set(word_dict)
    queue = deque([(s, "")])
    res = []
    
    while queue:
        s, path = queue.popleft()
        if not s:
            res.append(path.strip())
        for word in word_set:
            if s.startswith(word):
                queue.append((s[len(word):], path + ' ' + word))
    
    return res

s = "catsanddog"
word_dict = ["cat", "cats", "and", "sand", "dog"]
print(word_break(s, word_dict))

Output:

["cats and dog", "cat sand dog"]

The iterative method initializes a queue with the given string and an empty path for constructing sentences. It extracts partial sentences and checks if any words from the dictionary can be appended. If the remaining string is empty, it adds the constructed sentence to the results. This method is more intuitive than recursion for some and can be easier to debug.

Method 3: Trie and DFS

The Trie and Depth-First Search (DFS) method utilizes a trie data structure to store the dictionary words for quick prefix lookups, combined with a deep recursive search to construct all possible sentence combinations. This can be particularly efficient when working with a large dictionary.

Here’s an example:

class TrieNode:
    def __init__(self):
        self.word = False
        self.children = {}
        
def build_trie(words):
    root = TrieNode()
    for word in words:
        node = root
        for c in word:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
        node.word = True
    return root
        
def word_break(s, word_dict):
    root = build_trie(word_dict)

    def dfs(s, node, path, res):
        if not s:
            if node.word:
                res.append(path[:-1])
            return
       
        if node.word:
            dfs(s, root, path, res)
        
        if s[0] in node.children:
            dfs(s[1:], node.children[s[0]], path + s[0], res)

    res = []
    dfs(s, root, "", res)
    return res

s = "catsanddog"
word_dict = ["cat", "cats", "and", "sand", "dog"]
print(word_break(s, word_dict))

Output:

["cats and dog", "cat sand dog"]

In this method, we first construct a trie that represents our word dictionary. The DFS function is then used to recursively explore all possible paths in the trie that match sub-portions of the string. When the end of the string is reached, we check if it corresponds to a word in the trie and append the constructed path to the results. This method efficiently handles word lookups.

Method 4: Using a Graph

Building a graph representation of the problem allows for the application of graph traversal algorithms to find all possible sentence combinations from the string and dictionary. This method transforms the string into a graph of nodes representing start and end indices of dictionary words within the string. A search algorithm then finds all paths from the beginning to the end of the string.

Here’s an example:

def word_break(s, word_dict):
    word_set = set(word_dict)
    graph = {i: [] for i in range(len(s) + 1)}
    
    for i in range(len(s)):
        for j in range(i + 1, len(s) + 1):
            if s[i:j] in word_set:
                graph[i].append(j)
    
    def dfs(start, path, res):
        if start == len(s):
            res.append(' '.join(path))
        for end in graph[start]:
            dfs(end, path + [s[start:end]], res)
    
    res = []
    dfs(0, [], res)
    return res

s = "catsanddog"
word_dict = ["cat", "cats", "and", "sand", "dog"]
print(word_break(s, word_dict))

Output:

["cats and dog", "cat sand dog"]

This code uses a graph structured as a dictionary of lists to store possible end indices for substrings starting at index i that are present in the word dictionary. The dfs function is a depth-first search that constructs all sentence paths through the graph. When the end of the string is reached, the path is added to the result list.

Bonus One-Liner Method 5: The Recursive Exhaustive Search (Inefficient for large inputs)

The Recursive Exhaustive Search is the most straightforward approach and involves recursively checking every possible word combination without any pruning or memoization, leading to a lot of redundant calculations. This method is not recommended for large inputs or dictionaries.

Here’s an example:

def word_break(s, word_dict):
    def dfs(s):
        if not s:
            return [""]
        result = []
        for word in word_dict:
            if s.startswith(word):
                for rest in dfs(s[len(word):]):
                    result.append(word + (rest and ' ' + rest))
        return result

    return dfs(s)

s = "catsanddog"
word_dict = ["cat", "cats", "and", "sand", "dog"]
print(word_break(s, word_dict))

Output:

["cats and dog", "cat sand dog"]

This bonus method employs a concise recursive function that simply tries to append every word in the dictionary to the solution path if it matches the beginning of the string. Due to its simplicity, it’s easy to understand but is highly inefficient, especially with large strings and dictionaries, due to the exponential number of recursive calls.

Summary/Discussion

Method 1: Dynamic Programming with Backtracking. Provides an optimal solution by avoiding redundant calculations. However, it can be complex to understand and implement.
Method 2: Iterative with Queue. It is intuitive and easy to follow, but might not be as efficient as the dynamic programming approach for very large problem instances.
Method 3: Trie and DFS. Offers fast lookup times for words and is more memory-efficient for larger dictionaries. The downside is the complexity of constructing and traversing the trie.
Method 4: Using a Graph. Transforms the problem into a well-known graph problem, which may be beneficial in terms of understanding and applying known graph algorithms. Requires additional space for graph representation.
Bonus Method 5: The Recursive Exhaustive Search. It is easy to implement and understand but highly inefficient for practical use in all but the smallest datasets.