5 Best Ways to Find Number of Friend Groups in Friend Connections with Python

💡 Problem Formulation: Consider a scenario where you have a list of pairs, each representing a direct friendship between two individuals. The objective is to find the number of unique friend groups within this network. A friend group is defined as a set of individuals all connected directly or indirectly through friendships. Given input like [('A', 'B'), ('B', 'C'), ('D', 'E')], the desired output would be 2 since there are two distinct friend groups: one comprising A, B, and C, and the other comprising D and E.

Method 1: Using Depth-First Search (DFS)

Depth-First Search (DFS) is a fundamental graph traversal algorithm that can be used to explore all the friend connections for each individual recursively, marking friends as visited and counting the distinct friend groups. DFS is initiated for each unvisited node and continues as long as unvisited friends are found, effectively discovering whole friend groups.

Here’s an example:

def dfs(friends, node, visited):
    visited.add(node)
    for neighbor in friends[node]:
        if neighbor not in visited:
            dfs(friends, neighbor, visited)

def number_of_friend_groups(friendships):
    friends = {}
    for person1, person2 in friendships:
        friends.setdefault(person1, []).append(person2)
        friends.setdefault(person2, []).append(person1)
    visited = set()
    count = 0
    for person in friends:
        if person not in visited:
            dfs(friends, person, visited)
            count += 1
    return count

print(number_of_friend_groups([('A', 'B'), ('B', 'C'), ('D', 'E')]))

Output:

This code uses a dictionary to store the adjacency list representing the friend groups and a set to track visited nodes. The number_of_friend_groups function iterates through each node, calling dfs if the node is unvisited, thereby discovering a new friend group each time DFS is invoked. After completing the traversal, it returns the count of friend groups discovered.

Method 2: Using Union-Find Data Structure

The Union-Find data structure, also known as Disjoint Set, is effective for tracking a set of elements partitioned into multiple disjoint subsets. It is suitable for our problem as it can quickly merge sets and check if two individuals belong to the same friend group.

Here’s an example:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
    
    def find(self, a):
        if self.parent[a] != a:
            self.parent[a] = self.find(self.parent[a])
        return self.parent[a]
    
    def union(self, a, b):
        pa = self.find(a)
        pb = self.find(b)
        if pa != pb:
            self.parent[pa] = pb

def number_of_friend_groups(friendships):
    ids_map = {person: id for id, person in enumerate(set(sum(friendships, ())))}
    uf = UnionFind(len(ids_map))
    for person1, person2 in friendships:
        uf.union(ids_map[person1], ids_map[person2])
    groups = set(uf.find(x) for x in range(len(ids_map)))
    return len(groups)

print(number_of_friend_groups([('A', 'B'), ('B', 'C'), ('D', 'E')]))

Output:

In this code example, the UnionFind class manages the disjoint sets. The number_of_friend_groups function first maps each person to a unique ID, then uses the Union-Find to merge sets corresponding to friendships. Finally, it counts the distinct root parents, which represent the number of friend groups.

Method 3: Using Breadth-First Search (BFS)

Breadth-First Search (BFS) is another graph traversal algorithm that uses a queue to explore neighbor nodes level by level. Similar to DFS but exploiting properties of BFS, each execution of BFS from an unvisited node will traverse all nodes in a friend group.

Here’s an example:

from collections import deque

def bfs(friends, start, visited):
    queue = deque([start])
    visited.add(start)
    while queue:
        node = queue.popleft()
        for neighbor in friends[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

def number_of_friend_groups(friendships):
    friends = {}
    for person1, person2 in friendships:
        friends.setdefault(person1, []).append(person2)
        friends.setdefault(person2, []).append(person1)
    visited = set()
    count = 0
    for person in friends:
        if person not in visited:
            bfs(friends, person, visited)
            count += 1
    return count

print(number_of_friend_groups([('A', 'B'), ('B', 'C'), ('D', 'E')]))

Output:

This method utilizes a similar structure to Method 1, but instead of a recursive DFS, it uses an iterative BFS approach. The number_of_friend_groups function initializes a queue with each unvisited node, and the BFS continues until all accessible friends from that node are visited, indicating a full friend group is counted.

Method 4: Using NetworkX Library

NetworkX is a Python library designed to work with complex networks. Utilizing NetworkX can simplify the task of identifying friend groups, as it contains built-in algorithms and data structures tailored for such analysis.

Here’s an example:

import networkx as nx

def number_of_friend_groups(friendships):
    G = nx.Graph()
    G.add_edges_from(friendships)
    return nx.number_connected_components(G)

print(number_of_friend_groups([('A', 'B'), ('B', 'C'), ('D', 'E')]))

Output:

This snippet shows how to use the NetworkX library to build a graph from edge pairs representing friendships and then use the number_connected_components function to directly obtain the number of friend groups. This method abstracts away all the graph traversal logic under the hood and provides a high-level, easy-to-use interface.

Bonus One-Liner Method 5: Using Connected Components with SciPy

If you are working with large sparse graphs, SciPy provides efficient ways to handle them. The connected_components function from the scipy.sparse.csgraph module can be used to determine friend groups within sparse matrices.

Here’s an example:

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import connected_components

def number_of_friend_groups(friendships):
    persons = {person for pair in friendships for person in pair}
    id_map = {person: i for i, person in enumerate(persons)}
    n = len(persons)
    matrix = csr_matrix((n, n), dtype=int)
    for person1, person2 in friendships:
        matrix[id_map[person1], id_map[person2]] = 1
        matrix[id_map[person2], id_map[person1]] = 1
    n_components, labels = connected_components(csgraph=matrix, directed=False)
    return n_components

print(number_of_friend_groups([('A', 'B'), ('B', 'C'), ('D', 'E')]))

Output:

This one-liner utilizes SciPy’s sparse matrix representation and the connected_components function to identify distinct friend groups. By creating a matrix with rows and columns representing individuals and using 1’s to indicate direct friendship connections, the connected_components function efficiently finds all friend groups.

Summary/Discussion

Method 1: Depth-First Search. Strengths: Conceptually simple and doesn’t require additional libraries. Weaknesses: Recursive DFS can hit recursion limits for very large graphs.
Method 2: Union-Find Data Structure. Strengths: Highly efficient for dynamic connectivity problems. Weaknesses: Requires some setup and understanding of the data structure.
Method 3: Breadth-First Search. Strengths: Appropriate for expansive graphs and guarantees the shortest path discovery. Weaknesses: Slightly more complex due to the queue management.
Method 4: Using NetworkX Library. Strengths: Simplifies code and provides a straightforward interface. Weaknesses: Additional dependency and can be less efficient for very large sparse graphs.
Bonus Method 5: Using Connected Components with SciPy. Strengths: Optimized for sparse graphs and can handle very large data sets. Weaknesses: Requires familiarity with SciPy and sparse matrix operations.