- The leaked document is titled “We Have No Moat, And Neither Does OpenAI.”
- It argues that open-source AI development is winning and that Google and other companies have no competitive advantage or “moat” in the field.
- The document suggests that Google and other companies should focus on building tools and infrastructure that support open-source AI development rather than trying to compete with it.
- The document provides a fascinating insight into the state of AI development and the challenges facing companies like Google as they try to stay ahead of the curve.
- Open-source development is unstoppable and has never been more alive! 🥳
Diving Into the Document
A leaked Google document titled “We Have No Moat, And Neither Does OpenAI” has recently garnered attention. Shared anonymously on a public Discord server, the document comes from a Google researcher and offers a frank analysis of the AI development landscape.
The document contends that open-source AI development is prevailing, leaving Google and other companies without a competitive edge.
Considering Google’s status as an AI leader and its substantial investments, this is a notable claim.
💡 Quote: “But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.”
Here are some interesting developments in the open-source community:
- Offline Fast LLMs: As reported in a recent Finxter article, many large language models can now be run offline. A Twitter user even shared how he ran a foundation model on a Pixel 6 at 5 tokens per second speed!
- Scalable Personal AI: Projects like Alpaca-Lora allow you to fine-tune a personalized AI on your notebook in a couple of hours.
- Multimodality: Researchers release new multimodal models that are trained in less than one hour and are freely available via GitHub. Here‘s the paper.
- Responsible Release: You can find a list of pre-trained LLMs for textual data generation on myriads of new websites. Other websites now share generative art models, generated by Midjourney or DALL-E, without restrictions. See an example here: 👇
The researcher suggests that instead of competing with open-source AI, Google and other companies should concentrate on creating tools and infrastructure to support it. This strategy would ensure rapid AI advancements and widespread benefits.
Check out this wonderful analysis from the article:
💡 Quote: “Many of the new ideas are from ordinary people. The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.”
The leak has sparked significant debate within the AI community, with some criticizing Google for not adequately supporting open-source AI and others lauding the company for recognizing its own limitations.
LoRA – An Innovation Worth Keeping In Mind
Low-Rank Adaptation of Large Language Models (LoRA) is a powerful technique we should focus on more.
LoRA works by simplifying model updates, making them much smaller and faster to process. This allows us to improve a language model quickly on regular computers, which is great for adding new and diverse information in real-time. Even though this technology could help Google’s ambitious projects, it’s not used enough.
Retraining models from scratch is difficult and time-consuming.
LoRA is effective because it can be combined with other improvements, like instruction tuning. These improvements can be added on top of each other to make the model better over time without needing to start from scratch.
This means that when new data or tasks become available, the model can be updated quickly and cheaply. On the other hand, starting from scratch wastes previous improvements and becomes very expensive.
We should think carefully about whether we need a new model for every new idea. If we have major improvements that make reusing old models impossible, we should still try to keep as much of the previous model’s abilities as possible.
I couldn’t resist adding this interesting quote from the article:
💡 Quote: “LoRA updates are very cheap to produce (~$100) for the most popular model sizes. This means that almost anyone with an idea can generate one and distribute it. Training times under a day are the norm. At that pace, it doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage. Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and the best are already largely indistinguishable from ChatGPT. Focusing on maintaining some of the largest models on the planet actually puts us at a disadvantage.”
Timeline of LLM Developments (Overview)
Feb 24, 2023 – Meta launches LLaMA, an open-source code with various model sizes.
March 3, 2023 – LLaMA is leaked, allowing anyone to experiment with it.
March 12, 2023 – Artem Andreenko runs LLaMA on a Raspberry Pi.
March 13, 2023 – Stanford releases Alpaca, enabling low-cost fine-tuning of LLaMA.
March 18, 2023 – Georgi Gerganov runs LLaMA on a MacBook CPU using 4-bit quantization.
March 19, 2023 – Vicuna, a cross-university collaboration, achieves “parity” with Bard at $300 training cost.
March 25, 2023 – Nomic creates GPT4All, an ecosystem for models like Vicuna, at $100 training cost.
March 28, 2023 – Open Source GPT-3 by Cerebras outperforms existing GPT-3 clones.
March 28, 2023 – LLaMA-Adapter introduces instruction tuning and multimodality with just 1.2M learnable parameters.
April 3, 2023 – Berkeley launches Koala, users prefer it or have no preference 50% of the time compared to ChatGPT.
April 15, 2023 – Open Assistant launches a model and dataset for Alignment via RLHF, achieving near-ChatGPT human preference levels.
💡 Recommended: 6 New AI Projects Based on LLMs and OpenAI
Competing with Open-Source is a Losing Game
Open-source AI development is a better approach than closed-source AI development, particularly when considering the potential of Artificial General Intelligence (AGI). The open-source approach fosters collaboration, accessibility, and transparency, while promoting rapid development, preventing monopolies, and ensuring many benefits.
Here are a few reasons why I think open-source AI development should win in the long-term:
Collaboration is key in open-source AI, as researchers and developers from diverse backgrounds work together to innovate, increasing the likelihood of AGI breakthroughs.
Open-source AI is accessible to anyone, regardless of location or financial resources, which encourages a broader range of perspectives and expertise.
Transparency in open-source AI allows researchers to address biases and ethical concerns, fostering responsible AI development.
By building upon existing work, developers can rapidly advance AI technologies, bringing us closer to AGI.
Open-source AI also reduces the risk of single organizations dominating the AI landscape, ensuring that advancements serve the greater good.
Additionally, the benefits of AI are more evenly distributed across society through open-source AI, preventing the concentration of power and wealth.
Lastly, open-source AI development improves the security of AI systems, as potential flaws can be discovered and fixed by a larger community of researchers and developers.
Let’s end this article with another great quote from the article:
💡 Quote: “Google and OpenAI have both gravitated defensively toward release patterns that allow them to retain tight control over how their models are used. But this control is a fiction. Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.”
Feel free to share this article with your friend ♥️ and download our OpenAI Python API Cheat Sheet and the following “Glossary” of modern AI terms:
OpenAI Glossary Cheat Sheet (100% Free PDF Download) 👇
Finally, check out our free cheat sheet on OpenAI terminology, many Finxters have told me they love it! ♥️
💡 Recommended: OpenAI Terminology Cheat Sheet (Free Download PDF)
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.