Breadth-First Search (BFS) Algorithm in Python

What is a Graph?

When we want to illustrate how one thing relates to another, most often, we would preferably use a graph. From a concrete perspective, a graph is a graphical representation or an image of relationships. A graph is built of entities and their relationships. Entities represent actors in a relationship you are aiming to describe. Entities are usually depicted by a simple geometrical shape, such as a box, an oval, or a circle. When there is a relationship between any two entities, it is commonly illustrated by a line connecting the entities.

In computer science and mathematics, an entity is usually referred to as a node n or a vertex v (plural vertices), and a relationship is referred to as an arc a or an edge e. To ensure clarity and consistency, from now on we will stick with the terms “vertex/vertices” and “edge(s)”.

What is the Purpose of a Graph?

A common purpose of a graph is to help visualize or coherently organize a system of any degree of complexity, such as a manufacturing process, a network of any kind, i.e. in a social, professional, technical, or natural environment. Although, in some of these cases, a more specialized type of graph, a tree, is used to model hierarchical relationships. Another common purpose of a graph is finding the optimal/shortest path, under given conditions, between any two vertices.

How Does It Work?

A graph works by introducing an adequate number of vertices to represent all the entities you need and connecting them by their edges. Then, starting at an arbitrary vertex, all vertices that are directly or indirectly connected can eventually be reached by following their edges. This way, we can see what elements compose the path between any two observed vertices, and we can also notice the possible alternative paths.

Graph Definition

From a more abstract perspective, we define a graph as a set of vertices {a, b, c, d, …} and a collection of edges, i.e. pairs of vertices, e.g. (a, b), (a, c). A “collection of edges” part of the graph definition implies that we are allowing more than one edge with the same pair of vertices. In that case, we refer to them as parallel or multiple edges. However, we can replace the part of the definition “collection of edges” with the part “set of edges” and effectively restrict any edge from appearing more than once. A graph with such a definition is known as a simple graph.

A graph’s edge can be either directed or undirected. An undirected edge stands for a symmetrical relationship between the two vertices, meaning the relationship is identical in both ways. An example of a symmetric relationship can be found in social networking, where a is a friend of b and b is a friend of a. An undirected edge is expressed in a shorter form as (a, b) = (b, a). A directed edge stands for an asymmetrical relationship between the two vertices, meaning the relationship is not identical in both ways. An example of an asymmetric relationship is an arbitrary hierarchy, where a is superior to b, but b is inferior to a. Another example is a production process in a factory, where iron ore, limestone, and coal are processed together and steel is produced. However, steel cannot be processed in reverse to produce iron ore, limestone, and coal. A directed edge is expressed in a shorter form as (a, b).

A graph that contains only the directed edges is called a directed graph, or digraph. If only undirected edges constitute a graph, it is referred to as an undirected graph. The third kind of graph, containing both the directed and undirected edges is called a mixed graph.

Before we continue, we will add a few more terms to our basis of knowledge to easier comprehend what follows. When an undirected edge joins the two vertices, we call these vertices the endpoint vertices, end vertices, or simply just endpoints. On the other hand, a directed edge connects the origin vertex and the destination vertex. The two vertices are adjacent when there is at least one edge connecting the vertices. An edge is adjacent to a vertex when the vertex is one of its endpoints. When a vertex is an origin of a directed edge, we call it an outgoing edge. Contrary, when a vertex represents a destination of a directed edge, we say it is an incoming edge. An out-degree of a vertex, or shorter, outdeg(a), stands for the total number of its outgoing edges. An in-degree of a vertex a, or shorter, indeg(a) represents the total number of its incoming edges. The degree of a vertex a, deg(a) is the total number of its adjacent edges.

How is a Graph Implemented?

We will implement our graph by introducing three complex user types, i.e. the Vertex class for the vertices, the Edge class for the edges, and the Graph class for the graph construction, unifying the former two types.

The Vertex object holds a value representing a real-world object, an entity that forms a relationship with other entities. We will equip it with a method for accessing the containing value, entity().

In its simplest form and our current implementation, an Edge object contains only a pair of vertices (a, b). In more complex cases (common when modeling a real-world phenomenon), the Edge object may also contain additional information, describing how are the vertices connected. In essence, these complex cases assign weights or labels to the edges. We will equip our Edge object with methods endpoints() and opposite().

The Graph class is the top-level object consisting of both Vertex and Edge objects, organized into dictionaries. Its behavior is currently implemented via methods is_directed(), adjacent_edges(), add_vertex(), add_edge(), vertices(), and edges().

class Graph:

    def __init__(self, directed=False):
        self._outgoing = {}

        # If the graph is undirected, 'self._outgoing'
        # is the universal storage.
        self._incoming = {} if directed else self._outgoing

    # If the graph is directed, the 'self._incoming' 
    # dictionary differs from the 'self._outgoing'.
    def is_directed(self):
        return self._incoming is not self._outgoing

    # The function returns a generator of incoming
    # or outgoing (default) edges of a vertex.
    def adjacent_edges(self, vertex, outgoing=True):
        # References the corresponding outer dictionary
        # (dictionary of dictionaries)
        adj_edges = self._outgoing if outgoing else self._incoming

        # Access each of the edges for this endpoint vertex.
        for edge in adj_edges[vertex].values():
            yield edge

    def add_vertex(self, entity=None):
        # Constructs a new vertex from the entity.
        vertex = self.Vertex(entity)

        # The vertex becomes a key in the outer dictionary,
        # but the value is an internal dictionary (as we model
        # both dimensions for each edge: origin and destination).
        # e.g. {vertex_1a:{vertex_b:edge_a_b}, vertex_b:{vertex_c:edge_b_c}}.
        self._outgoing[vertex] = {}
        if self.is_directed():
            self._incoming[vertex] = {}

    def add_edge(self, origin, destination):
        # Constructs a new edge from the vertices.
        edge = self.Edge(origin, destination)

        # Adds the edge to the dictionary (dictionaries are
        # the same if the graph is undirected). The outer key
        # represents the origin, i.e. the component 'a' of
        # the edge-defining pair (a, b). The inner key stands
        # for the component 'b' of the edge-defining pair (a, b).
        self._outgoing[origin][destination] = edge
        
        # Even if the graph is undirected, each edge has to
        # be added twice, i.e. once for each of its endpoints.
        self._incoming[destination][origin] = edge

    def vertices(self):
        return self._outgoing.keys()

    def edges(self):
        # All the edges are collected into a set.
        result = set()
        for inner_dict in self._outgoing.values():
            result.update(inner_dict.values())
        return result


    class Vertex:
        __slots__ = '_entity'

        def __init__(self, entity):
            self._entity = entity

        # The real-world entity is represented by the Vertex object.
        def entity(self):
            return self._entity

        # We have to implement __hash__ to use 
        # the object as a dictionary key.
        def __hash__(self):
            return hash(id(self))


    class Edge:
        __slots__ = '_origin', '_destination'

        def __init__(self, origin, destination):
            self._origin = origin
            self._destination = destination

        def endpoints(self):
            return (self._origin, self._destination)

        # Returns the other component of the edge-defining pair (a, b)
        # for a given component a or b, respectively.
        def opposite(self, vertex):
            return self._destination if self._origin is vertex \
                else self._origin

        def __hash__(self):
            return hash((self._origin, self._destination))

What is a Breadth-First Search?

A breadth-first search is a graph traversal algorithm. It traverses the graph by organizing the vertices into levels and traverses the vertices one level per iteration.

What is Its Purpose?

The breadth-first search algorithm has various applications, such as finding the shortest path between any two reachable vertices in a network, solving optimization problems in scheduling, or searching for a winning strategy in a game resulting in a winning or losing state.

How Does BFS Work?

The breadth-first algorithm begins by marking the start vertex as visited and placing it into the map of visited nodes (level 0).

The algorithm then takes the next vertex from the map of visited vertices (currently populated only by the start vertex), going from the older ones toward the newer ones. It inspects the vertex by

  • 1. following one by one of the vertex’s edges,
  • 2. finding an immediate unvisited endpoint vertex,
  • 3. marking it as visited, and
  • 4. placing it into the map of visited vertices (level 1).

The algorithm progresses to the next level of visited vertices only after it finishes inspecting all the vertices on the current level. This way, the algorithm simulates a queue. The main property of a queue is that the first element that enters the queue is also the first element that leaves the queue. This property is commonly referred to as first-in-first-out, or shorter, FIFO. The process continues until all the vertices are inspected or the solution has been found.

What Are Its Properties?

Among others, the breadth-first search algorithm has two very interesting properties, which we will focus on.

The reachability property states that the traversal will visit all vertices that are reachable from the starting vertex. We can be sure of this, because if we begin our search from any starting vertex, and no vertex is disconnected from the rest of the graph, there is a direct path (one edge away) or an indirect path (multiple vertices and edges away) to reach any vertex.

The shortest path property states that given the start vertex a is on level 0, and the end vertex b is on level i, the path from a to b is i edges away, and any alternative path is at least i edges away. In other words, the number of levels separating the vertices a and b also define the shortest possible distance, and any path following these levels is also the shortest possible path. Any other path cannot be shorter than that, but it could be at least as long, or longer.

How is BFS Implemented in Python?

The implementation of our breadth-first search algorithm by a function BFS() has several parameters. The graph parameter expects an initialized Graph object. The start parameter takes the starting vertex, which we choose as we like (remember, a graph is not a tree, there is no absolute root). The visited parameter references a map, i.e. a dictionary of visited vertices whose values are the edges along the search path. The parameter is defined externally so that we can resume the search at a later moment and construct the search path. The target parameter is the entity we want to find in the graph, enclosed in a vertex. For a better understanding of the algorithm and implementation, each step is precisely described in the code below.

def BFS(graph, start, visited, target=None):
    # First-level searh includes only the 'start' vertex.
    level = [start]
    # The starting vertex is visited first and has no leading edges.
    # If we did not put it into 'visited' in the first iteration,
    # it would end up here during the second iteration, pointed to
    # by one of its children vertices as a previously unvisited vertex.
    visited[start] = None
    
    # Trivial check #1: searches for None are immediately terminated.
    if target is None:
        return target
    # Trivial check #2: if the entity is in the starting vertex.
    elif target == start.entity():
        return start
    
    # Propagates the search until all the vertices are visited.
    while len(level) > 0:
        # Candidates to be searched next (children of the vertex).
        next_level = []
        for v in level:
            # Explores every edge leaving the vertex 'v'.
            print(f'Searching from vertex: {v.entity()}...')
            for edge in graph.adjacent_edges(v):
                
                # Gets the second endpoint.
                v_2nd_endpoint = edge.opposite(v)
                
                # Examines the second endpoint.
                if v_2nd_endpoint not in visited:
                    # Adds the second endpoint to 'visited'
                    # and maps the leading edge for the 
                    # search path reconstruction.
                    visited[v_2nd_endpoint] = edge
                    
                    # If the entity is found, terminates the search.
                    if v_2nd_endpoint.entity() == target:
                        return v_2nd_endpoint
                    
                    # Otherwise, queues the second
                    # endpoint for the search.
                    next_level.append(v_2nd_endpoint)
                    print('  Vertex added for the next-level search: '
                          f'{v_2nd_endpoint.entity()}')
        # Refocuses on the next search candidates.
        level = next_level
    # If the search fails...
    return None
Before we can test the algorithm, we have to initialize a graph and build it by adding vertices and edges to it:
# Initializes an empty graph (object).
g = Graph()

# Loads the graph with the first ten vertices.
for i in range(10):
    g.add_vertex(i)

# Constructs the 'vertices' dictionary for a more
# convenient access during the graph construction.
vertices = {k.entity():k for k in g.vertices()}

# Constructs an arbitrary graph from
# the existing vertices and edgs.
g.add_edge(vertices[0], vertices[1])
g.add_edge(vertices[0], vertices[2])
g.add_edge(vertices[0], vertices[4])
g.add_edge(vertices[4], vertices[3])
g.add_edge(vertices[3], vertices[5])
g.add_edge(vertices[0], vertices[5])
g.add_edge(vertices[2], vertices[6])

# Initializes the visited dictionary
# and the search path.
visited = {}
path = []

Now that we have prepared everything, we can test the BFS() and see how it works. Here is the part of the code that runs the algorithm, constructs the search path (if there is one) and shows in step-by-step manner how it proceeds through the graph:

# Starts the search.
result = BFS(g, vertices[5], visited, 6)

# If the entity is found...
if result is not None:
    # The search path ends with the found vertex 
    # (entity). Each vertex is a container for
    # its real-world entity.
    path_vertex = result

    # The entity is added to the 'path'.
    path.append(path_vertex.entity())

    # Constructs the rest of the search path
    # (if it exists)...
    while True:
        # Gets a discovery edge 
        # leading to the vertex.
        path_edge = visited.get(path_vertex)

        # If the path vertex is the root, 
        # it has no discovery edge...
        if path_edge is None:
            break

        # Otherwise, gets the second
        # (parent vertex) endpoint.
        path_vertex = path_edge.opposite(path_vertex)

        # The entity is added to the 'path'.
        path.append(path_vertex.entity())
    print('Search path found:', end=' ')
    # The path is reversed and starts
    # with the root vertex.
    print(*reversed(path), sep=' -> ')

# Otherwise...
else:
    print('\nEntity is not found')

The test run gave us the output:

Searching from vertex: 5...
  Vertex added for the next-level search: 3
  Vertex added for the next-level search: 0
Searching from vertex: 3...
  Vertex added for the next-level search: 4
Searching from vertex: 0...
  Vertex added for the next-level search: 1
  Vertex added for the next-level search: 2
Searching from vertex: 4...
Searching from vertex: 1...
Searching from vertex: 2...
Search path found: 5 -> 0 -> 2 -> 6

Based on the output, we can see that the search started from the vertex 5, and that the BFS() has found the entity vertex 6. The entire shortest search path is also displayed.

Efficiency Analysis

The breadth-first search algorithm’s time complexity is O(|V| + |E|), where V stands for the number of vertices, and E for the number of edges. It will process each vertex once and each edge twice. It requires a constant amount of time in processing an edge.

The algorithm is less space-efficient than some other algorithms, because it stores a whole level of vertices to visit in the next iteration. This property gets very pronounced as the searching progresses through a densely connected graph with a large number of vertices.

The speed of the algorithm is appropriate for solutions relatively close to the starting vertex. If the solution is nested deep in the graph, the algorithm might take a very long run time, but eventually, it will find the solution.

Conclusion

In the first part of this article, we learned about graph structure. First, we introduced a notion of a graph structure, in terms of what it is and what it represents, along with some of the basic terms associated with it. Second, we described what is the purpose of a graph, i.e. how and where it is commonly used. Third, we explained how a graph works. Fourth, a more formal graph definition is given. Several additional terms are introduced and the basic types of graphs are listed. Fifth, we took a look at an implementation of a graph via three main Python classes.

After these introductory sections, in the sixth section, we introduced a breadth-first search algorithm. Seventh, we explained the main use and purpose of the breadth-first search algorithm. Eighth, we took a look at the algorithm’s main steps of operation. Ninth, the algorithm’s two key properties are mentioned and explained. In section ten, we look at how the algorithm is implemented, building on the previously established foundations of the graph implementation. We also tested the algorithm by calling its main function, BFS(), and analyzed its steps of execution. Eleventh, after seeing the algorithm work, we overviewed its efficiency and noticed that there are cases when the breadth-first search algorithm could be less suited for solving specific problems. However, we concluded that regardless of its efficiency, if the solution exists, the breadth-first search algorithm will always find it.