Looks Like GPT-4-32k is Rolling Out - Be on the Right Side of Change

Ever gotten this error when trying to generate a large body of text with GPT-4?

This model’s maximum context length is <8192> tokens. However, your messages resulted in <a Gazillion> tokens. Please reduce the length of the messages.

So was I. Now I have just discovered that the new "gpt-4-32k" model slowly rolls out, as reported by several early adopters.

Here’s an example API call you can issue if you already have access in the playground:

payload = {
    "model": "gpt-4-32k",
    "messages": [
        {"role": "system", "content": "You are William Shakespeare."},
        {"role": "user", "content": "Write a story on love."}
    ]
}

However, please note that only a selected group of people already has access to it. You can check it in the Playground > Mode > Chat > Model > gpt-4-32k.

This is how it looks if you don’t have access yet: 😢

Some Interesting Facts on GPT-4-32k 🤯

OpenAI’s GPT-4 offers a larger context window of 32k tokens, providing a broad range of applications, including simplifying Q&A Chatbot creation by fitting entire databases into the 32k prompt and summarizing large data sets effectively. It can even interpret complex documents like the IRS tax code. However, the rollout of GPT-4 is based on a waitlist, with earlier joiners having quicker access.

OpenAI released GPT-4 32k model to early adopters.
It seems to be released in the order of joining the waitlist, probabilistically.
The 32k model can handle 32,000 tokens of context.
One token generally corresponds to ~4 characters of text for common English text, which equates to roughly ¾ of a word.
The 32k model can thus process context equivalent to approximately 24,000 words.
Regarding page count, this roughly translates to around 48-50 single-spaced pages of text.
There’s a new tokenizer for GPT-4: https://tiktokenizer.vercel.app/.
The cost of using this model is high, making it potentially inaccessible for wider usage; $0.60 for 20k prompt tokens.
Some users are exploring chat history compression techniques to mitigate high usage costs.
For code, depending on the language and formatting, the model can handle approximately 4.5k to 2k lines of code.
There are ongoing discussions about possible strategies for managing larger document interactions within the 32k token limit.
However, it’s worth noting that the OpenAI API is stateless, meaning the entire conversation, including its response, is limited to 32k tokens.

Personally, I’d use the 32,000 tokens as context, possibly integrating larger embeddings to provide a truly helpful, context-sensitive, intelligent Q&A bot. Read more here: 👇

How Many Words Can GPT-4-32k Generate (Max)?

32k tokens yield roughly 3/4 of 32k, i.e., 24k words.

How Many Pages Does GPT-4-32k Generate (Max)?

Assuming that each page has 500 words, that’s 24000/500 = 48 pages.

What If You Don’t Have Access to GPT-4-32k Yet?

As you wait for GPT-4-32k access, you can play with open-source models such as MosaicML which recently released a 64k tokens variant.

See here for more: 👇

🧑‍💻 Recommended: MPT-7B: A Free Open-Source Large Language Model (LLM)