The Evolution of Large Language Models (LLMs): Insights from GPT-4 and Beyond

Playing with any large language model (LLM), such as GPT-4, is fascinating.

But it doesn’t give you an accurate understanding of where AGI is heading because one isolated snapshot provides limited information. You can gain more insight into the growth and dynamicity of LLMs by comparing two subsequent snapshots.

Roughly speaking, it’s less interesting to see where baby AGI is and more interesting to look at how it evolves.

To gain more insight on this, Emily has just contributed another interesting Finxter blog article:

👩‍💻 Recommended: [Blog] 10 High-IQ Things GPT-4 Can Do That GPT-3.5 Can’t

Check it out. It’s a solid read! ⭐

It’s fascinating to observe how the concept of transformers introduced in the 2017 paper “Attention is all you need” has scaled so remarkably well.

In essence, the significant advancements made in AI over the past four years have mostly come from scaling up the transformer approach to an incredible magnitude. The concept of GPT (Generative Pre-trained Transformers) has remained largely unchanged for around six years.

They just threw more data and more hardware on the same algorithm. This was possible due to the higher amount of scalability and degree of parallelization unlocked by the transformer idea.

From the paper (highlights by me):

🚀 “In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization … the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.”

⚡ My main takeaway from comparing GPT-3.5 to GPT-4 is that the limits of performance improvements are not yet reached by simply throwing more and more data and hardware on these models. And when the performance (=IQ) of transformer models ultimately converges — probably at a super-human IQ level — we’ll still be able to change and improve on the underlying abstractions to eke out additional IQ.

Likely, transformers will not remain the last and best-performing model for all future AI research. We have tried only the tip of the iceberg on what scale these models go. I wouldn’t be surprised if the data sets and computational power of future GPT models increased by 1,000,000x.

Truly an exciting time to be alive! 🤖

I’m scared and fascinated at the same time. It’s so new and so dangerous. Ubiquitous disruption of the work marketplace is already happening fast. I’d estimate that in our economy, we already have north of one billion “zombie jobs”, i.e., job descriptions that could be fully automated with ChatGPT and code. I know of closed-loop AI models under government review that classify cancer with almost zero error rate. Medical doctors with lower accuracy are still doing the classification – but for how long?

A new era is starting. When we went from 99% to 1% farmers, we accomplished a massive leap of free work energy that led to an explosion of collective intelligence. The same is happening now: 99% of the jobs will be gone sooner than we expect. A massive amount of free energy will catapult humanity forward like we’ve never experienced in the history of humanity.

Buckle up for the ride. I’ll be here to help you navigate the waters until my job will be disrupted too and AGI will help you more effectively than I ever could.

The future is bright! 🚀🌞

Chris

This was part of my free newsletter on technology and exponential technologies. You can join us by downloading our cheat sheets here: