How do you copy GPT-4 without actually copying the model weights? In this article, you’ll learn how!
💡 Researchers from UC Berkeley have unveiled Starling-7B, an innovative large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF), as opposed to the Reinforcement Learning from Human Feedback (RLHF) approach used by many competitors.
Starling-7B uses the GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy tuning pipeline. The Nectar dataset is created by prompting seven LLMs like Claude 2 and LLama-2 and asking GPT-4 to rank them.
The Nectar dataset is a comprehensive collection of 183K chat prompts and 3.8M pairwise comparisons across various models.
The idea is to converge to one model (aka. Starling-7B) that provides a best-in-class answer for all questions:
And in fact, Starling-7B has made waves by achieving a score of 8.09 in MT Bench, surpassing nearly all models to date, including “star LLMs” like Claude 2 and Llama-2.
Well, OpenAI’s GPT-4 and its Turbo variant still outperform the open-source competitor but it’s getting closer! In a way, the previously described generation of the Nectar dataset couldn’t really outperform GPT-4 given that GPT-4 itself is used to rank the responses by inferior models.
Roughly speaking, all they can hope to achieve with this approach is to reach GPT-4’s performance!
Yet, despite improvements in helpfulness and safety, Starling-7B’s basic capabilities in knowledge-based QA, math, and coding need further development.
Additionally, its susceptibility to jailbreaking attempts and occasional verbosity highlight areas for future improvement.
Many open-source researchers use GPT-4 this way to basically clone the model without actually cloning the model. Researchers are not permitted to use ChatGPT’s outputs for training. However, they can use GPT-4 as a gold standard for weighing responses.
The takeaway is simple: keep using GPT-4 and GPT-4 Turbo if you can (e.g., pay for the subscription or use FinxterGPT on any of our blog posts).
But don’t worry if you cannot afford it, the open-source community and researchers worldwide are working on open-source LLMs that keep improving as quickly as GPT-4. With creative solutions like constructing the training dataset, as seen in this article, the open-source community stays on OpenAI’s heals.
The model has already launched on the model comparison site here.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.