OpenAI Q*: Data May Not Be the Bottleneck Anymore. AI Compute Is! (NVDA, TSLA, AMZN)

5/5 - (1 vote)

A few days ago, I sent you a highly speculative article on rumors about OpenAI’s breakthrough technology called Q*. I admitted that the information was very thin.

But in a fresh exclusive with The Verge, Sam Altman dropped the following bombshell:

  • Interviewer: The reports about the Q* model breakthrough that you all recently made, what’s going on there?
  • Sam Altman: No particular comment on that unfortunate leak. But what we have been saying β€” two weeks ago, what we are saying today, what we’ve been saying a year ago, what we were saying earlier on β€” is that we expect progress in this technology to continue to be rapid and also that we expect to continue to work very hard to figure out how to make it safe and beneficial.

In my logic: If Sam calls it a leak, it must be real. (?)

Diving deeper into the topic, I found the following interesting nugget:

“The […] breakthrough allowed OpenAI to overcome limitations on obtaining enough high-quality data to train new models […] The research involved using computer-generated [data], rather than real-world data like text or images pulled from the internet, to train new models. That appears to be a reference to the idea of training algorithms with so-called synthetic training data, which has emerged as a way to train more powerful AI models.” Wired

Like Tesla’s new end-to-end full-self-driving (FSD) system, synthetic data is now being used by OpenAI to train neural nets!

I share this with you because I believe the following take is not well-understood by the general public:

  • The AI Scaling Laws suggest we have not reached saturation in performance – all we need to do is throw more compute and more data on the problem.
  • Many argue that data is the limiting factor – e.g., in humanoid robotics, we don’t have a lot of real-world data.
  • However, if we can effectively train models on synthetic data, we’ve mitigated this bottleneck!

We’re already seeing a double-digit percentage productivity improvement for our economy due to LLMs. This trend is sustainable for at least a couple of years. Maybe decades!

Have a look at this research result in a recent first-class paper “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models”:

πŸ§‘β€πŸ’» “Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. […]”

Another big takeaway is this: when synthetic data is an effective option, the bottleneck for AI will be compute. Singularity will take our hunger for compute to the moon.

10x, 100x, 1000x of demand for AI compute in a few years wouldn’t surprise me in the least.

Bullish for AI compute providers like NVDA, AMZN, and TSLA!

Be on the right side of change! πŸš€

πŸ”— Original Publication in my newsletter. Join us free!