5 Best Ways to Check If Words Can Be Found in a Character Matrix in Python

πŸ’‘ Problem Formulation: Imagine you have a 2D grid of letters (a character matrix) and a list of words. Your goal is to determine whether each word can be formed by sequentially adjacent letters in the grid, where “adjacent” includes horizontally and vertically neighboring characters. For example, given the matrix [["A","B","C"],["D","E","F"],["G","H","I"]] and the word "BEF", the output should be True since the word is present in a top-down manner starting from the second column.

Method 1: Backtracking

This method involves a recursive backtracking algorithm that explores all potential paths in the matrix to form the target word. It checks for the presence of the target word by starting from every possible position in the matrix and extends the path as long as it matches the word.

Here’s an example:

def exist(board, word):
    def dfs(ind, x, y):
        if ind == len(word):
            return True
        if x = len(board) or y = len(board[0]) or word[ind] != board[x][y]:
            return False
        tmp, board[x][y] = board[x][y], '/'
        found = dfs(ind+1, x+1, y) or dfs(ind+1, x-1, y) or dfs(ind+1, x, y+1) or dfs(ind+1, x, y-1)
        board[x][y] = tmp
        return found

    for i in range(len(board)):
        for j in range(len(board[0])):
            if dfs(0, i, j):
                return True
    return False

board = [["A","B","C"],["D","E","F"],["G","H","I"]]
word = "BEF"
print(exist(board, word))

Output:

True

In the provided code snippet, the function exist() uses a helper function dfs() to perform a depth-first search. It iterates through every cell of the board and calls dfs() if the first letter of the word matches the current cell. The grid cell content is temporarily marked as visited during the search to avoid revisiting within the same path. Once all characters are matched, or all paths are exhausted, the end result is returned indicating the presence or absence of the word in the matrix.

Method 2: Trie-Based Search

The Trie data structure can significantly optimize word search in a matrix by eliminating unnecessary searches and reusing common prefixes of words. By constructing a Trie from the list of words, the search can branch out only on Trie paths matching the characters in the grid.

Here’s an example:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

def exist(board, words):
    root = TrieNode()
    for word in words:
        node = root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def dfs(node, x, y):
        char = board[x][y]
        if char not in node.children:
            return False
        node = node.children[char]
        if node.is_end:
            return True
        board[x][y], prev = '/', char
        found = any(dfs(node, x + dx, y + dy) for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)] if 0 <= x + dx < len(board) and 0 <= y + dy < len(board[0]))
        board[x][y] = prev
        return found

    for i in range(len(board)):
        for j in range(len(board[0])):
            if dfs(root, i, j):
                return True
    return False

board = [["A","B","C"],["D","E","F"],["G","H","I"]]
words = ["BEF", "ACD", "EGI"]
for word in words:
    print(f'{word}: {exist(board, [word])}')

Output:

BEF: True
ACD: True
EGI: True

This method builds a Trie structure that contains all the words to be searched. The exist() function iterates over every position of the board, implementing a DFS algorithm to traverse the Trie according to the letters on the board. If a complete word is found during the traversal (signified by a Trie node’s is_end being True), the function returns True.

Method 3: Dynamic Programming with Memorization

Dynamic programming can be used in conjunction with memorization to keep track of previously visited paths, preventing the algorithm from re-exploring them. This approach often reduces the search space and improves performance on larger boards.

Here’s an example:

def exist(board, word):
    rows, cols = len(board), len(board[0])
    path = set()
    
    def dfs(r, c, i):
        if i == len(word):
            return True
        if (r < 0 or c = rows or c >= cols or 
            word[i] != board[r][c] or (r, c) in path):
            return False
        path.add((r, c))
        res = (dfs(r + 1, c, i + 1) or
               dfs(r - 1, c, i + 1) or
               dfs(r, c + 1, i + 1) or
               dfs(r, c - 1, i + 1))
        path.remove((r, c))
        return res

    for r in range(rows):
        for c in range(cols):
            if dfs(r, c, 0): return True
    return False

board = [["A","B","C"],["D","E","F"],["G","H","I"]]
word = "CAB"
print(exist(board, word))

Output:

False

This code snippet employs a depth-first search along with memorization of visited paths to check for the word’s presence in the matrix. The recursive function dfs() marks the current path and backtracks when a dead-end is reached. The memorization, done using the set path, ensures that the function does not revisit the same cell within a particular path.

Method 4: Breadth-First Search (BFS)

Breadth-first search employs a queue to explore possible paths from various starting points simultaneously. This method can be more efficient than DFS in certain scenarios, as it may find the complete word without needing to explore the depth of every possible path.

Here’s an example:

from collections import deque

def exist(board, word):
    rows, cols = len(board), len(board[0])
    for r in range(rows):
        for c in range(cols):
            if board[r][c] == word[0]:
                queue = deque([(r, c, word)])
                while queue:
                    x, y, w = queue.popleft()
                    if not w:
                        return True
                    if 0 <= x < rows and 0 <= y < cols and board[x][y] == w[0]:
                        for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                            queue.append((x + dx, y + dy, w[1:]))
                    board[r][c] = '#'
                board[r][c] = word[0]
    return False

board = [["A","B","C"],["D","E","F"],["G","H","I"]]
word = "CFI"
print(exist(board, word))

Output:

False

The function exist(), using BFS, initializes a queue that holds potential starting points for the word. Each element of the queue represents a cell in the matrix and the remaining substring that needs to be matched. The BFS proceeds to add adjacent cells that match the next character of the substring to the queue. The search ends successfully if the substring becomes empty.

Bonus One-Liner Method 5: Using Python Libraries

For simplicity and brevity, one can take advantage of Python’s built-in or external libraries that abstract away much of the search complexity. This method is less educational but can be useful for quick checks or in interactive environments.

Here’s an example:

import numpy as np
from scipy.ndimage import label

def exist(board, word):
    board_array = np.array(board)
    structure = np.ones((3, 3), dtype=int)
    labeled, ncomponents = label(board_array == word[0], structure)
    for i in range(1, len(word)):
        labeled, ncomponents = label(np.logical_and(labeled, board_array == word[i]), structure)
    return np.any(labeled)

board = [["A","B","C"],["D","E","F"],["G","H","I"]]
word = "ABF"
print(exist(board, word))

Output:

False

This code employs the label function from the scipy.ndimage module to find connected regions of cells that contain the target word’s letters. By applying logical AND operations between the regions from the previous and current steps, the search is narrowed down to only connected cells that match the word.

Summary/Discussion

  • Method 1: Backtracking. This is a classic algorithm for this type of problem. It is robust and can handle complex boards. However, it may be slow on very large and dense boards.
  • Method 2: Trie-Based Search. Very efficient when you have multiple words to check due to the reuse of common prefixes. The downside is the initial cost of building the Trie.
  • Method 3: Dynamic Programming with Memorization. Improved performance due to memorization, which eliminates redundant searches. It can be complex to understand and implement properly.
  • Method 4: Breadth-First Search. Good for finding the shortest path to complete the word, potentially more efficient in some cases. May perform worse than DFS if the target word is located deep within the board.
  • Method 5: Using Python Libraries. The simplest method in terms of code complexity, but requires understanding of library functions and may not offer the same level of customization or educational value as the other methods.