5 Best Ways to Execute Parallel Tasks in Python - Be on the Right Side of Change

💡 Problem Formulation: Python developers often need to speed up their applications by running tasks in parallel. Let’s say you have a list of URL’s and you want to download them all as quickly and efficiently as possible. That’s a perfect scenario for executing parallel tasks. This article will guide through five methods of accomplishing that in Python, providing increased performance for computationally intensive operations.

Method 1: Using the threading module

Python’s built-in threading module enables the execution of multiple operations concurrently within a single Python process. Threads are lightweight and well-suited for I/O-bound tasks because they can run in parallel, making them efficient for operations like downloading files from the Internet where the program has to wait for external responses.

Here’s an example:

import threading

def download_url(url):
    # Assume we have a function that downloads the URL 
    print(f"Downloading {url}")

urls = ["http://example.com/a", "http://example.com/b", "http://example.com/c"]
threads = []

for url in urls:
    t = threading.Thread(target=download_url, args=(url,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

print("Finished downloading all URLs.")

The output of this code snippet:

Downloading http://example.com/a
Downloading http://example.com/b
Downloading http://example.com/c
Finished downloading all URLs.

This code snippet demonstrates how to use the threading module to create a thread for each URL to be downloaded. Each thread starts the download, and the main program waits for all threads to finish using the join() method.

Method 2: Using the multiprocessing module

Python’s multiprocessing module is useful for CPU-bound tasks that require heavy computation and can be distributed across multiple CPUs. It creates separate processes for parallel execution, bypassing the Global Interpreter Lock (GIL) limitation in CPython, which allows for actual concurrent execution on multicore processors.

Here’s an example:

from multiprocessing import Pool

def process_data(data):
    # Some CPU-intensive processing
    print(f"Processing {data}")

if __name__ == "__main__":
    pool = Pool()
    data_to_process = range(10)  # Example data set
    pool.map(process_data, data_to_process)
    pool.close()
    pool.join()

The output of this code snippet:

Processing 0
Processing 1
Processing 2
...
Processing 9

This code snippet illustrates how to use the multiprocessing module’s Pool class to create a pool of worker processes that execute a function with the data provided in parallel.

Method 3: Using the asyncio module

Python’s asyncio module provides facilities for writing asynchronous I/O-based tasks with an event loop. It’s ideal for handling a large number of network connections concurrently, without the overhead of creating threads, perfect for high-level structured network code.

Here’s an example:

import asyncio
import aiohttp

async def download_url(url, session):
    async with session.get(url) as response:
        print(f"Downloaded {url}")

async def main():
    async with aiohttp.ClientSession() as session:
        urls = ["http://example.com/a", "http://example.com/b", "http://example.com/c"]
        tasks = [download_url(url, session) for url in urls]
        await asyncio.gather(*tasks)

asyncio.run(main())

The output of this code snippet:

Downloaded http://example.com/a
Downloaded http://example.com/b
Downloaded http://example.com/c

This code snippet demonstrates asynchronous I/O operations in Python using asyncio and aiohttp. Each URL is downloaded concurrently, without waiting for the others to complete. The asyncio.gather function is used for executing asynchronous tasks concurrently.

Method 4: Using the concurrent.futures module

The concurrent.futures module provides a high-level interface for asynchronously executing callables with ThreadPoolExecutor for I/O-bound tasks and ProcessPoolExecutor for CPU-bound tasks. It simplifies the management of a pool of threads or processes and presents a future-based API.

Here’s an example:

from concurrent.futures import ThreadPoolExecutor

def fetch_url(url):
    # Code to fetch URL goes here
    print(f"Fetched {url}")

urls = ["http://example.com/a", "http://example.com/b", "http://example.com/c"]
with ThreadPoolExecutor() as executor:
    executor.map(fetch_url, urls)

The output of this code snippet:

Fetched http://example.com/a
Fetched http://example.com/b
Fetched http://example.com/c

This snippet uses ThreadPoolExecutor from the concurrent.futures module to issue URL fetch calls in parallel across a pool of threads.

Bonus One-Liner Method 5: Using List Comprehension with Threads

If you want a quick and dirty way to spawn threads without much setup, Python’s list comprehension combined with the threading module can be used as a one-liner to start multiple threads for simple operations.

Here’s an example:

[threading.Thread(target=lambda: print(f"Task {i}")).start() for i in range(3)]

The output of this code snippet:

Task 0
Task 1
Task 2

This one-liner creates and starts three threads, each of which will print out “Task” followed by its iteration number.

Summary/Discussion

Method 1: threading. Ideal for I/O-bound tasks. Less efficient for CPU-heavy tasks due to the GIL.
Method 2: multiprocessing. Great for CPU-intensive work by leveraging multiple CPUs. Heavier than threads in terms of resources.
Method 3: asyncio. Suited for I/O-bound tasks with an asynchronous model. Can be complex to understand and implement properly.
Method 4: concurrent.futures. Provides a high-level API for thread or process-based parallelism. Simplifies working with futures and callbacks.
Bonus Method 5: List comprehension with threads. Quick and simple for launching lightweight tasks. Lacks robust control of threads compared to other methods.