5 Best Ways to Implement Multiprocessing in Python

πŸ’‘ Problem Formulation: When faced with tasks that can be executed in parallel, such as processing multiple data streams or performing calculations on chunks of a large dataset, Python’s Global Interpreter Lock (GIL) can be a bottleneck for CPU-bound operations. The goal is to leverage Python’s ability to execute operations concurrently, thereby reducing overall execution time.

Method 1: Using the multiprocessing Module

The multiprocessing module in Python allows the programmer to create processes that can run independently and concurrently, mimicking the behavior of threading but avoiding the GIL limitation by using separate memory spaces. It includes possibilities to synchronize processes and share data between them.

Here’s an example:

from multiprocessing import Process

def print_square(number):
    print(f'The square of {number} is {number * number}')

if __name__ == '__main__':
    processes = []
    for i in range(5):
        p = Process(target=print_square, args=(i,))
        processes.append(p)
        p.start()

    for process in processes:
        process.join()

Output:

The square of 0 is 0
The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16

This code snippet demonstrates how to create multiple processes that perform a computation concurrently by using the Process class. Each process is started with the start() method and synchronization is achieved with the join() method to allow each process to complete before moving on.

Method 2: Pool Class for a Pool of Workers

The Pool class from the multiprocessing module provides a convenient means of parallelizing the execution of a function across multiple input values by distributing input data across processes that run concurrently. It abstracts away much of the manual setup required when using Process directly.

Here’s an example:

from multiprocessing import Pool

def cube(number):
    return number * number * number

if __name__ == '__main__':
    with Pool(4) as pool:
        cubes = pool.map(cube, range(10))
        print(cubes)

Output:

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

Here, a pool of workers is created to map the cube function over a range of numbers. The Pool.map() function behaves like the built-in map, but performs the mapping in parallel. The provided code results in the calculation of cubes in a distributed manner among available worker processes.

Method 3: Process Communication with Queues

Communication between processes is important in multiprocessing to ensure that tasks can collaborate. Queue instances are used for this purpose, providing a FIFO (first-in, first-out) data structure that can be safely shared between multiple processes.

Here’s an example:

from multiprocessing import Process, Queue

def worker(queue, number):
    queue.put(f'Worker number {number} reporting!')

if __name__ == '__main__':
    q = Queue()
    workers = [Process(target=worker, args=(q, i)) for i in range(3)]
    for w in workers:
        w.start()
    for w in workers:
        w.join()
    while not q.empty():
        print(q.get())

Output:

Worker number 0 reporting!
Worker number 1 reporting!
Worker number 2 reporting!

This code creates a queue and several worker processes that put messages in it. Once all workers have finished processing, the main process retrieves and prints the messages. This demonstrates a simple way to collect results from various processes using a shared Queue.

Method 4: Sharing State Between Processes

To share state between multiple processes in Python, you can use shared memory objects like Value or Array from the multiprocessing module. These objects are created in a shared memory space and can be accessed by all processes.

Here’s an example:

from multiprocessing import Process, Value, Lock

def increment(shared_value, lock):
    with lock:
        shared_value.value += 1

if __name__ == '__main__':
    shared_num = Value('i', 0)
    lock = Lock()
    processes = [Process(target=increment, args=(shared_num,lock)) for _ in range(10)]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

    print(f'The final value is {shared_num.value}')

Output:

The final value is 10

This code demonstrates sharing a numerical value between processes, ensuring that each increment operation is safely executed by using a lock. The end result shows that all ten processes have successfully incremented the shared value without interfering with each other.

Bonus One-Liner Method 5: Using concurrent.futures

The concurrent.futures module provides a high-level interface for asynchronously executing callables. The ThreadPoolExecutor or ProcessPoolExecutor classes can be used for threading or multiprocessing, respectively, with a simple and easy-to-use API.

Here’s an example:

from concurrent.futures import ProcessPoolExecutor

def power(base, exponent):
    return base ** exponent

with ProcessPoolExecutor() as executor:
    result = executor.submit(power, 2, 3)
    print(result.result())

Output:

8

In this code snippet, we use ProcessPoolExecutor to execute a power calculation in a separate process and obtain the result. This demonstrates a minimalist approach to multiprocessing by abstracting away most of the boilerplate code involved.

Summary/Discussion

  • Method 1: Using the multiprocessing Module. Offers full control over individual processes. Can be complex to setup for beginners. Best for CPU-intensive tasks.
  • Method 2: Pool Class for a Pool of Workers. Simplifies the distribution of a task across multiple inputs. Limited to simple, map-reduce style parallelism. Not suitable for tasks requiring inter-process communication.
  • Method 3: Process Communication with Queues. Enables safe communication between processes. Good for producer-consumer problems. Can be less efficient for heavy communication loads due to potential bottlenecks at the Queue.
  • Method 4: Sharing State Between Processes. Allows sharing of state. Requires careful synchronization to prevent race conditions. Good when state must be observed by multiple processes.
  • Bonus Method 5: Using concurrent.futures. Easy to use and implement. Provides a more modern interface for running tasks concurrently. Less control over the underlying processes compared to multiprocessing.