Can I Run OpenAI's API in Parallel? Yes, with Python Async!

If you’re like me, you’re using OpenAI API a lot in your Python code. So the natural question arises: “How to use OpenAI’s API asynchronously by issuing multiple requests at once?”

I will give you my code for asynchronous OpenAI API requests for copy and paste below. But first, allow me to give you a word of warning from coder to coder:

Generally speaking, coders who use asynchronous code do it just because they want to, not because it is needed. Asynchronous code is hard to read, error-prone, inefficient due to context switches, and unpredictable.

Specifically, when using asynchronous requests against the OpenAI API, you should be aware of the rate limits that may become the bottleneck of your asynchronous Python app:

https://platform.openai.com/account/rate-limits

Okay, enough of the warnings. Let’s do it: 👇

Method 1: Using OpenAI API Calls Asynchronously

I have developed the following code to issue OpenAI requests asynchronously in Python — make sure to replace the highlighted lines with your OpenAI key and your desired prompts:

import aiohttp
import asyncio
import openai

# Set up your OpenAI API key
openai.api_key = 'sk-...'

# Example prompts
prompts = ["What is the capital of France?",
           "How does photosynthesis work?",
           "Who wrote 'Pride and Prejudice'?"]

async def async_openai_request(prompt):
    url = "https://api.openai.com/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {openai.api_key}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "temperature": 1,
        "max_tokens": 150,
        "top_p": 1,
        "frequency_penalty": 0,
        "presence_penalty": 0
    }

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data, headers=headers) as response:
            return await response.json()


async def main():

    # Gather results from all asynchronous tasks
    results = await asyncio.gather(*(async_openai_request(prompt) for prompt in prompts))

    for prompt, result in zip(prompts, results):
        print(f"Prompt: {prompt}")
        print(f"Response: {result['choices'][0]['message']['content']}\n")


# Run the main function
asyncio.run(main())

I’ll give you the output at the end of this article. But first let’s go through the code step by step to ensure you understand everything.

Note that if you need a refresher on the Python OpenAI API, feel free to check out this Finxter Academy course:

Step 1: Imports

The code begins by importing three essential libraries.

aiohttp is used for making asynchronous HTTP requests, allowing the program to send and receive data from the OpenAI API without blocking the main thread.

asyncio provides the tools to write concurrent code using the async/await syntax to handle multiple tasks simultaneously.

Lastly, the openai library is the official OpenAI API client, facilitating interactions with the OpenAI platform.

Step 2: Set up the OpenAI API Key

Following the imports, the OpenAI API key is set up. It’s important to note that hard-coding API keys directly in the code is not a recommended practice. For security reasons, it’s better to use environment variables or configuration files to store such sensitive information.

Step 3: Asynchronous Function for API Requests

The async_openai_request function is defined to handle asynchronous requests to the OpenAI API.

async def async_openai_request(prompt):
    url = "https://api.openai.com/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {openai.api_key}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "temperature": 1,
        "max_tokens": 150,
        "top_p": 1,
        "frequency_penalty": 0,
        "presence_penalty": 0
    }

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data, headers=headers) as response:
            return await response.json()

When this function is called with a specific prompt, it prepares and sends an asynchronous request to the OpenAI API’s chat completions endpoint.

The headers for the request include the authorization, which uses the API key, and the content type. The payload (data) sent to the API specifies several parameters, including the model to use (gpt-4), the message format containing the user’s prompt, and other parameters like temperature, max_tokens, top_p, frequency_penalty, and presence_penalty that influence the output.

The function then establishes an asynchronous session using aiohttp and sends a POST request with the specified data and headers. Once the response is received, it’s returned in JSON format.

Step 4: Main Asynchronous Function

The main function encapsulates the primary logic of the program. It starts by defining a list of example prompts.

async def main():

    # Gather results from all asynchronous tasks
    results = await asyncio.gather(*(async_openai_request(prompt) for prompt in prompts))

    for prompt, result in zip(prompts, results):
        print(f"Prompt: {prompt}")
        print(f"Response: {result['choices'][0]['message']['content']}\n")

For each of these prompts, asynchronous requests are sent to the OpenAI API using the previously defined async_openai_request function. The asyncio.gather method is employed to concurrently collect results from all the asynchronous tasks. Once all responses are received, the function iterates over the prompts and their corresponding results, printing them out for the user.

Step 5: Execution

Finally, the asyncio.run(main()) command is used to execute the main function. When the code is run, it will send asynchronous requests for each of the example prompts and display the responses in the console.

The output is:

Prompt: What is the capital of France?
Response: The capital of France is Paris.

Prompt: How does photosynthesis work?
Response: Photosynthesis is a process used by plants, algae and certain bacteria to convert sunlight, water and carbon dioxide into food and oxygen. This process happens inside the chloroplasts, specifically using chlorophyll, the green pigment involved in photosynthesis.

Photosynthesis occurs in two stages: the light-dependent reactions and the light-independent reactions, also known as the Calvin Cycle.

In the light-dependent reactions, which take place in the thylakoid membrane of the chloroplasts, light energy is converted into chemical energy. When light is absorbed by chlorophyll, it excites the electrons, increasing their energy level and triggering a series of chemical reactions. Water molecules are split to produce oxygen, electrons, and hydrogen ions. The oxygen is released into the

Prompt: Who wrote 'Pride and Prejudice'?
Response: 'Pride and Prejudice' was written by Jane Austen.

Method 2: Using OpenAI ChatCompletion’s acreate()

An alternative to the above method of using asynchronous requests against the OpenAI API endpoint is to use OpenAI’s native asynchronous methods, as noted in the docs: “Async support is available in the API by prepending a to a network-bound method”.

In the following code, I have only changed the OpenAI API calling function to use the acreate() method:

import aiohttp
import asyncio
import openai

# Set up your OpenAI API key
openai.api_key = 'sk-...'


async def create_chat_completion(prompt):
    chat_completion_resp = await openai.ChatCompletion.acreate(model="gpt-4", messages=[{"role": "user", "content": prompt}])
    return chat_completion_resp


async def main():
    # Example prompts
    prompts = ["What is the capital of France?",
               "How does photosynthesis work?",
               "Who wrote 'Pride and Prejudice'?"]

    # Gather results from all asynchronous tasks
    results = await asyncio.gather(*(create_chat_completion(prompt) for prompt in prompts))

    for prompt, result in zip(prompts, results):
        print(f"Prompt: {prompt}")
        print(f"Response: {result['choices'][0]['message']['content']}\n")


# Run the main function
asyncio.run(main())

Output:

Prompt: What is the capital of France?
Response: The capital of France is Paris.

Prompt: How does photosynthesis work?
Response: Photosynthesis is the process by which green plants, algae and some bacteria convert light energy, usually from the sun, into chemical energy in the form of glucose (sugar). This process is essential for life on earth as it is the primary source of all oxygen in the atmosphere.

This process takes place in a part of the plant cell called the chloroplast, more specifically, within the chlorophyll molecules, which absorbs sunlight (specifically, photons) and gives plants their green color. 

It can be divided into two main stages: the light-dependent reactions and the light-independent reactions or Calvin Cycle.

In the light-dependent reactions, which take place in the thylakoid membrane of the chloroplasts, light is absorbed by the chlorophyll and converted into chemical energy - in the form of ATP (Adenosine triphosphate) and NADPH (Nicotinamide adenine dinucleotide phosphate). This process also splits water molecules (H2O) into oxygen (O2), which is released into the atmosphere, and hydrogen ions (H+), which are used in the next stage of photosynthesis.

In the second stages, the light-independent reactions or Calvin Cycle, which take place in the stroma of the chloroplasts, the ATP and NADPH produced in the light-dependent reactions, along with carbon dioxide (CO2) from the atmosphere, are used to produce glucose (sugar), which is used as an energy source for plant growth and development.

In summary, during photosynthesis, light energy is converted into chemical energy, which fuels the organisms' activities, and oxygen is released into the atmosphere as a byproduct.

Prompt: Who wrote 'Pride and Prejudice'?
Response: 'Pride and Prejudice' was written by Jane Austen.

Make sure to check out our course on prompt engineering with Llama 2 in case you’re looking to leverage a free and open-source large language model (LLM) instead of the paid OpenAI API:

Prompt Engineering with Llama 2

💡 The Llama 2 Prompt Engineering course helps you stay on the right side of change. Our course is meticulously designed to provide you with hands-on experience through genuine projects.

You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics. These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.

By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using 🐍 Python, 🔗🦜 Langchain, 🌲 Pinecone, and a whole stack of highly ⚒️🛠️ practical tools of exponential coders in a post-ChatGPT world.