How do you copy GPT-4 without actually copying the model weights? In this article, you’ll learn how!
π‘ Researchers from UC Berkeley have unveiled Starling-7B, an innovative large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF), as opposed to the Reinforcement Learning from Human Feedback (RLHF) approach used by many competitors.
Starling-7B uses the GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy tuning pipeline. The Nectar dataset is created by prompting seven LLMs like Claude 2 and LLama-2 and asking GPT-4 to rank them.
The Nectar dataset is a comprehensive collection of 183K chat prompts and 3.8M pairwise comparisons across various models.
The idea is to converge to one model (aka. Starling-7B) that provides a best-in-class answer for all questions:
And in fact, Starling-7B has made waves by achieving a score of 8.09 in MT Bench, surpassing nearly all models to date, including “star LLMs” like Claude 2 and Llama-2.
Well, OpenAI’s GPT-4 and its Turbo variant still outperform the open-source competitor but it’s getting closer! In a way, the previously described generation of the Nectar dataset couldn’t really outperform GPT-4 given that GPT-4 itself is used to rank the responses by inferior models.
Roughly speaking, all they can hope to achieve with this approach is to reach GPT-4’s performance!
Yet, despite improvements in helpfulness and safety, Starling-7B’s basic capabilities in knowledge-based QA, math, and coding need further development.
Additionally, its susceptibility to jailbreaking attempts and occasional verbosity highlight areas for future improvement.
Many open-source researchers use GPT-4 this way to basically clone the model without actually cloning the model. Researchers are not permitted to use ChatGPT’s outputs for training. However, they can use GPT-4 as a gold standard for weighing responses.
The takeaway is simple: keep using GPT-4 and GPT-4 Turbo if you can (e.g., pay for the subscription or use FinxterGPT on any of our blog posts).
But don’t worry if you cannot afford it, the open-source community and researchers worldwide are working on open-source LLMs that keep improving as quickly as GPT-4. With creative solutions like constructing the training dataset, as seen in this article, the open-source community stays on OpenAI’s heals.
The model has already launched on the model comparison site here.
π Official website
πͺ Nectar Dataset
π§βπ» Model Card Huggingface