Mixtral 8x7B Outperforms LLaMA 2 and Beats ChatGPT in Speed by 21x to 44x!

What Is Mixtral 8x7B?

Mixtral 8x7B is a cutting-edge language model developed by Mistral AI. It outperforms the Llama 2 70B model on various benchmarks, while being six times faster. Notably, it can speak multiple languages and is a skilled coder. Plus it can manage a sequence length (context window) of 32,000 tokens.

How Fast Is Mixtral 8x7B? Is It Faster Than ChatGPT?

Mixtral 8x7B is blazingly fast. It’s up to 44x times faster than ChatGPT (GPT-4 Turbo) for many simple coding prompts (e.g., Fibonacci) and 21x faster to write a 10-liner poem.

Asking Mixtral 8x7B for both recursive and iterative Fibonacci took 0.29 seconds. Compare this to ChatGPT (GPT-4 Turbo) which took me 12.91 seconds for the exact same prompt!
Asking Mixtral 8x7B to write a poem about Finxter in 10 lines took 0.35 seconds, whereas ChatGPT (GPT-4 Turbo) took 7.59 seconds.

Can I Use Mixtral 8x7B Commercially (License)?

You can use the Mixtral 8x7B model commercially. It’s released under the Apache 2.0 license, a permissive open-source license allowing commercial use. You can use, modify, and distribute the original or derivative works based on it, even commercially.

Mixtral 8x7B is an open-weight model. This means it can be used through Mistral AI’s API or can be deployed independently.

The model is part of Mistral AI’s commitment to open science, community, and free software, as they release many of their models and deployment tools under permissive licenses.

What’s the Architecture of Mixtral 8x7B?

The architecture of Mixtral 8x7B is interesting:

It employs a Sparse Mixture of Experts (SMoE) model, using only 2 of the 8 experts for token inference. This strategy optimizes processing efficiency and speed. The model is designed to be more accessible than the 600-pound gorilla in the space, GPT-4, both in terms of usability and the computational resources required to run it.

You be the judge.

The effects are impressive though! Have a look at this comparison table against the impressive open-source LLM from Meta, i.e., LLaMA 2:

Where Can I Try Mixtral 8x7B?

There are various ways to get started with Mixtral 8x7B, including downloading it directly from Mistral AI, using the Mistral AI platform, or accessing it through platforms like Perplexity Labs, Hugging Face, Together.AI.

My personal favorite is Perplexity AI, just select mixtral-8x7b-instruct from the dropdown:

How Can I Run It Using the Transformers Library (Huggingface)?

This uses the Hugginface transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Hello world!"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

More here.

🚀 Recommended: Prompt Engineering with Llama 2 (Full Course)