Starling-7B: UC Berkeley’s New Open-Source LLM

4.7/5 - (3 votes)

How do you copy GPT-4 without actually copying the model weights? In this article, you’ll learn how!

πŸ’‘ Researchers from UC Berkeley have unveiled Starling-7B, an innovative large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF), as opposed to the Reinforcement Learning from Human Feedback (RLHF) approach used by many competitors.

Starling-7B uses the GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy tuning pipeline. The Nectar dataset is created by prompting seven LLMs like Claude 2 and LLama-2 and asking GPT-4 to rank them.

The Nectar dataset is a comprehensive collection of 183K chat prompts and 3.8M pairwise comparisons across various models.

The idea is to converge to one model (aka. Starling-7B) that provides a best-in-class answer for all questions:

And in fact, Starling-7B has made waves by achieving a score of 8.09 in MT Bench, surpassing nearly all models to date, including “star LLMs” like Claude 2 and Llama-2.

Well, OpenAI’s GPT-4 and its Turbo variant still outperform the open-source competitor but it’s getting closer! In a way, the previously described generation of the Nectar dataset couldn’t really outperform GPT-4 given that GPT-4 itself is used to rank the responses by inferior models.

Roughly speaking, all they can hope to achieve with this approach is to reach GPT-4’s performance!

Yet, despite improvements in helpfulness and safety, Starling-7B’s basic capabilities in knowledge-based QA, math, and coding need further development.

Additionally, its susceptibility to jailbreaking attempts and occasional verbosity highlight areas for future improvement.

Many open-source researchers use GPT-4 this way to basically clone the model without actually cloning the model. Researchers are not permitted to use ChatGPT’s outputs for training. However, they can use GPT-4 as a gold standard for weighing responses.

The takeaway is simple: keep using GPT-4 and GPT-4 Turbo if you can (e.g., pay for the subscription or use FinxterGPT on any of our blog posts).

But don’t worry if you cannot afford it, the open-source community and researchers worldwide are working on open-source LLMs that keep improving as quickly as GPT-4. With creative solutions like constructing the training dataset, as seen in this article, the open-source community stays on OpenAI’s heals.

The model has already launched on the model comparison site here.

πŸ”— Official website
πŸͺ„ Nectar Dataset
πŸ§‘β€πŸ’» Model Card Huggingface