5 Best Ways to Count the Number of Words Generated from a Matrix of Letters in Python

💡 Problem Formulation: Imagine having a 2D matrix of letters representing a grid similar to a crossword puzzle. The task is to write a program that counts the number of valid English words that can be constructed by sequentially connecting adjacent (horizontally, vertically, or diagonally) letters without reusing any cell. For example, given a matrix like [[‘a’,’b’],[‘c’,’d’]], and a word list [‘ab’, ‘abc’, ‘abd’, ‘ca’], the desired output would be 3, as ‘ab’, ‘abc’, and ‘ca’ can be formed.

Method 1: Depth-First Search (DFS)

A standard approach to solving such problems is using Depth-First Search (DFS) to explore all possible paths in the matrix. This recursion-based method traverses through adjacent cells to construct words and checks against a provided dictionary of valid words.

Here’s an example:

def dfs(matrix, word, path, x, y, visited):
    if x < 0 or y < 0 or x >= len(matrix) or y >= len(matrix[0]) or (x, y) in visited:
        return
    word += matrix[x][y]
    if is_valid_word(word):
        path.add(word)
    visited.add((x, y))
    directions = [(1,0), (-1,0), (0,1), (0,-1), (1,1), (1,-1), (-1,1), (-1,-1)]
    for dx, dy in directions:
        dfs(matrix, word, path, x + dx, y + dy, visited.copy())
    return path

# Imagine this is a previously defined function that returns True if word is valid
def is_valid_word(word):
    return word in ['ab', 'abd', 'ca', 'abc']

matrix = [['a','b'],['c','d']]
words_found = set()
for i in range(len(matrix)):
    for j in range(len(matrix[0])):
        dfs(matrix, '', words_found, i, j, set())

print(len(words_found))

Output: 3

This code snippet defines a recursive dfs() function that takes the matrix, currently constructed word, the set of found words, the current coordinates, and the visited cells as arguments. It explores all eight possible directions from the current cell to look for valid words, checking against the is_valid_word() function, adding to a set of found words.

Method 2: Trie and DFS

By leveraging a trie (prefix tree), we can optimize our search. The trie can quickly determine if the current letter sequence is a prefix to any valid word, which aids in pruning the search space.

Here’s an example:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.end_of_word = False

def build_trie(words):
    root = TrieNode()
    for word in words:
        node = root
        for letter in word:
            if letter not in node.children:
                node.children[letter] = TrieNode()
            node = node.children[letter]
        node.end_of_word = True
    return root

def trie_dfs(matrix, node, x, y, visited):
    if (x, y) in visited or x < 0 or y < 0 or x >= len(matrix) or y >= len(matrix[0]) or matrix[x][y] not in node.children:
        return 0
    visited.add((x, y))
    result = 1 if node.children[matrix[x][y]].end_of_word else 0
    directions = [(1,0), (-1,0), (0,1), (0,-1), (1,1), (1,-1), (-1,1), (-1,-1)]
    for dx, dy in directions:
        result += trie_dfs(matrix, node.children[matrix[x][y]], x + dx, y + dy, visited.copy())
    visited.remove((x,y))
    return result

trie = build_trie(['ab', 'abd', 'ca', 'abc'])
matrix = [['a','b'],['c','d']]
count = 0
for i in range(len(matrix)):
    for j in range(len(matrix[0])):
        count += trie_dfs(matrix, trie, i, j, set())

print(count)

Output: 3

The code above first constructs a trie from a list of valid words using the build_trie() function. Then it uses a modified trie_dfs() function to traverse the matrix and search for valid words using the trie to efficiently prune the search path. The count represents the number of valid words found.

Method 3: Dynamic Programming with Memorization

Dynamic programming can be employed to reduce the overhead of revisiting the same cells multiple times in some cases, using a memoization technique to remember previous results.

Here’s an example:

# Imagine this setup has occurred from previous examples
def dp_dfs(matrix, word, path, x, y, visited, memo):
    if (x, y, word) in memo:
        return memo[(x, y, word)]
    # The rest of the dfs code follows with 'memo' being updated for each recursive call.
    # For brevity, assume it's a similar structure to the DFS in Method 1 but with memoization.

The code snippet illustrates the addition of a memoization dictionary to the DFS method, which stores the intermediate results with the current coordinates and word as the key. This dictionary prevents redundant computations by returning the stored result if available.

Method 4: Breadth-First Search (BFS)

Breadth-First Search (BFS) can also be applied in this context to systematically explore all possible paths from each starting letter. While it’s less commonly used in such a problem due to its iterative nature and possible higher memory consumption, it remains a viable alternative.

Here’s an example:

# Similar BFS implementation to DFS but using a queue instead of recursion.

The BFS implementation would use a queue to manage the exploration of the paths, sequentially visiting each adjacent cell and keeping track of the constructed words. The main difference lies in the breadth-wise expansion rather than depth-wise.

Bonus One-Liner Method 5: Use of itertools and Built-in Functions

For smaller matrices and dictionaries, Python’s itertools and built-in functions can combine to create a one-liner that’s both elegant and straightforward, though possibly less efficient for large datasets.

Here’s an example:

# Assuming 'words' is the list of valid words and 'matrix' is defined as before.
from itertools import product
print(sum(any(word.startswith(''.join(matrix[x][y] for x, y in path)) for path in product(product(range(2), repeat=2), repeat=len(word))) for word in words))

Output: Depending on the words and matrix, but for the same words and matrix as above, the output would be 3.

This concise, albeit complex, one-liner uses product from itertools to generate all possible paths in the matrix, then checks for each word in the dictionary if any path is a starting segment of that word, summing up the valid counts.

Summary/Discussion

Method 1: Depth-First Search: Standard and robust. Handles large matrices well. May be slower without pruning.
Method 2: Trie and DFS: Highly efficient with pruning. Requires additional code for trie implementation. Best for a large dictionary of words.
Method 3: Dynamic Programming with Memorization: Optimizes repeated subproblems. Can be complex to implement correctly. Best when many paths overlap in parts.
Method 4: Breadth-First Search: Systematic and easier to understand. May consume more memory. Can be slower for large matrices.
Bonus Method 5: itertools and Built-in Functions: Elegant one-liner. Less efficient for large datasets. Can be difficult to understand and debug.