Alien Technology: Catching Up on LLMs, Prompting, ChatGPT Plugins & Embeddings

What is a LLM?

💡 From a technical standpoint, a large language model (LLM) can be seen as a massive file on a computer, containing billions or even trillions of numerical values, known as parameters. These parameters are fine-tuned through extensive training on diverse datasets, capturing the statistical properties of human language.

However, such a dry description hardly does justice to the magic of LLMs. From another perspective, they function almost like an oracle. You call upon them with a query, such as llm("What is the purpose of life"), and they may respond with something witty, insightful, or enigmatic, like "42" (a humorous nod to Douglas Adams’ The Hitchhiker’s Guide to the Galaxy).

By the way, you can check out my article on using LLMs like this in the command line here: 👇

Isn’t it wild to think about how Large Language Models (LLMs) can turn math into something almost magical? It’s like they’re blending computer smarts with human creativity, and the possibilities are just getting started.

Now, here’s where it gets really cool.

These LLMs take all kinds of complex patterns and knowledge and pack them into binary files full of numbers. We don’t really understand what these numbers represent but together they encode a deep understanding of the world. LLMs are densely compressed human wisdom, knowledge, and intelligence. Now imagine having these files and being able to copy them millions of times, running them all at once.

It’s like having a huge team of super-smart people, but they’re all in your computer.

So picture this: Millions of brainy helpers in your pocket, working day and night on anything you want.

👨‍⚕️ You know how doctors are always trying to figure out the best way to treat illnesses? Imagine having millions of super-smart helpers to quickly find the answers.

📈 Or think about your savings and investments; what if you had a team of top financial experts guiding you 24/7 to make the smartest choices with your money?

🏫 And for kids in school, picture having a personal tutor for every subject, making sure they understand everything perfectly. LLMs is like having an army of geniuses at your service for anything you need.

LLMs, what Willison calls alien technology, have brought us closer to solving the riddle of intelligence itself, turning what was once the exclusive domain of human cognition into something that can be copied, transferred, and harnessed like never before.

I’d go as far as to say that the age-old process of reproducing human intelligence has been transcended. Intelligence is solved. LLMs will only become smarter from now on. Like the Internet, LLMs will stay and proliferate and penetrate every single sector of our economy.

How Do LLMs Work?

The underlying mechanism of Large Language Models (LLMs) might seem almost counterintuitive when you delve into how they operate. At their core, LLMs are essentially word-prediction machines, fine-tuned to anticipate the most likely next word (more precisely: token) in a sequence.

For example consider ChatGPT’s LLM chat interface that has reached product market fit and is used by hundreds of millions of users. The ingenious “hack” that allows LLMs to participate in a chat interface is all about how the input is framed. In essence, the model isn’t inherently conversing with a user; it’s continuing a text, based on a conversational pattern it has learned from vast amounts of data.

Consider this simplified example:

You are a helpful assistant

User: What is the purpose of life?
Assistant: 42
User: Can you elaborate?
Assistant:

Here’s what’s happening under the hood:

Setting the Scene: The introductory line, "You are a helpful assistant" sets a context for the LLM. It provides an instruction to guide its responses, influencing its persona.
User Input: The following lines are framed as a dialogue, but to the LLM, it’s all part of a text it’s trying to continue. When the user asks, "What is the purpose of life?" the LLM looks at this as the next part of a story, or a scene in a play, and attempts to predict the next word or phrase that makes the most sense.
Assistant Response: The assistant’s response, "42" is the model’s guess for the next word, given the text it has seen so far. It’s a clever completion, reflecting the model’s training on diverse and creative texts. In the second run, however, the whole conversation is used as input and the LLM just completes the conversation.
Continuing the Conversation: When the user follows up with "Can you elaborate?" the LLM is once again seeking to continue the text. It’s not consciously leading a conversation but following the statistical patterns it has learned, which, in this context, would typically lead to an elaboration.

The magic is in how all these elements come together to create an illusion of a conversation. In reality, the LLM doesn’t understand the conversation or its participants. It’s merely predicting the next word, based on an intricately crafted pattern.

This “dirty little hack” transforms a word-prediction engine into something that feels interactive and engaging, demonstrating the creative application of technology and the power of large-scale pattern recognition. It’s a testament to human ingenuity in leveraging statistical learning to craft experiences that resonate on a very human level.

💡 Prompt Engineering is a clever technique used to guide the behavior of Large Language Models (LLMs) by crafting specific inputs, or prompts, that steer the model’s responses. It’s akin to creatively “hacking” the model to generate desired outputs.

For example, if you want the LLM to act like a Shakespearean character, you might begin with a prompt like "Thou art a poet from the Elizabethan era". The model, recognizing the pattern and language style, will respond in kind, embracing a Shakespearean tone.

This trickery through carefully designed prompts transforms a word-prediction machine into a versatile and interactive tool that can mimic various styles and tones, all based on how you engineer the initial prompt.

Prompt Engineering with Python and OpenAI

You can check out the whole course on OpenAI Prompt Engineering using Python on the Finxter academy. We cover topics such as:

Embeddings
Semantic search
Web scraping
Query embeddings
Movie recommendation
Sentiment analysis

👨‍💻 Academy: Prompt Engineering with Python and OpenAI

What’s the Secret of LLMs?

The secret to the magical capabilities of Large Language Models (LLMs) seems to lie in a simple and perhaps surprising element: scale. 👇

The colossal nature of these models is both their defining characteristic and the key to their unprecedented performance.

Tech giants like Meta, Google, and Microsoft have dedicated immense resources to developing LLMs. How immense? We’re talking about millions of dollars spent on cutting-edge computing power and terabytes of textual data to train these models. It’s a gargantuan effort that converges in a matrix of numbers — the model’s parameters — that represent the learned patterns of human language.

The scale here isn’t just large; it’s virtually unprecedented in computational history. These models consist of billions or even trillions of parameters, fine-tuned across diverse and extensive textual datasets. By throwing such vast computational resources at the problem, these corporations have been able to capture intricate nuances and create models that understand and generate human-like text.

However, this scale comes with challenges, including the enormous energy consumption of training such models, the potential biases embedded in large-scale data, and the barrier to entry for smaller players who can’t match the mega corporations’ resources.

The story of LLMs is a testament to the “bigger is better” philosophy in the world of artificial intelligence. It’s a strategy that seems almost brute-force in nature but has led to a qualitative leap in machine understanding of human language. It illustrates the power of scale, paired with ingenuity and extensive resources, to transform a concept into a reality that pushes the boundaries of what machines can achieve.

Attention Is All You Need

The 2017 paper by Google “Attention is All You Need” marked a significant turning point in the world of artificial intelligence. It introduced the concept of transformers, a novel architecture that is uniquely scalable, allowing training to be run across many computers in parallel both efficiently and easily.

This was not just a theoretical breakthrough but a practical realization that the model could continually improve with more and more compute and data.

💡 Key Insight: By using unprecedented amount of compute on unprecedented amount of data on a simple neural network architecture (transformers), intelligence seems to emerge as a natural phenomenon.

Unlike other algorithms that may plateau in performance, transformers seemed to exhibit emerging properties that nobody fully understood at the time. They could understand intricate language patterns, even developing coding-like abilities. The more data and computational power thrown at them, the better they seemed to perform. They didn’t converge or flatten out in effectiveness with increased scale, a behavior that was both fascinating and mysterious.

OpenAI, under the guidance of Sam Altman, recognized the immense potential in this architecture and decided to push it farther than anyone else. The result was a series of models, culminating in state-of-the-art transformers, trained on an unprecedented scale. By investing in massive computational resources and extensive data training, OpenAI helped usher in a new era where large language models could perform tasks once thought to be exclusively human domains.

This story highlights the surprising and yet profound nature of innovation in AI.

A simple concept, scaled to extraordinary levels, led to unexpected and groundbreaking capabilities. It’s a reminder that sometimes, the path to technological advancement isn’t about complexity but about embracing a fundamental idea and scaling it beyond conventional boundaries. In the case of transformers, scale was not just a means to an end but a continually unfolding frontier, opening doors to capabilities that continue to astonish and inspire.

Ten Tips to Use LLMs Effectively

As powerful and versatile as Large Language Models (LLMs) are, harnessing their full potential can be a complex endeavor.

Here’s a series of tricks and insights to help tech enthusiasts like you use them effectively:

Accept that No Manual Exists: There’s no step-by-step guide to mastering LLMs. The field is still relatively new, and best practices are continually evolving. Flexibility and a willingness to experiment are essential.
Iterate and Refine: Don’t reject the model’s output too early. Your first output might not be perfect, but keep iterating. Anyone can get an answer from an LLM, but extracting good answers requires persistence and refinement. You can join our prompt engineering beginner and expert courses to push your own understanding to the next level.
Leverage Your Domain Knowledge: If you know coding, use LLMs to assist with coding tasks. If you’re a marketer, apply them for content generation. Your expertise in a particular area will allow you to maximize the model’s capabilities.
Understand How the Model Works: A rough understanding of the underlying mechanics can be immensely beneficial. Following tech news, like our daily Finxter emails, can keep you informed and enhance your ability to work with LLMs.
Gain Intuition by Experimenting: Play around with different prompts and settings. Daily hands-on practice can lead to an intuitive feel for what works and what doesn’t.
Know the Training Cut-off Date: Different models have different cut-off dates. For example, OpenAI’s GPT-3.5 models were trained until September 2021, while Claude 2 Anthropic and Google PaLM 2 are more recent. This can affect the accuracy and relevance of the information they provide.
Understand Context Length: Models have limitations on the number of tokens (words, characters, spaces) they can handle. It’s 4000 tokens for GPT-3, 8000 for GPT-4, and 100k for Claude 2. Tailoring your input to these constraints will yield better results.
Develop a “Sixth Sense” for Hallucinations: Sometimes, LLMs may generate information that seems plausible but is incorrect or hallucinated. Developing an intuition for recognizing and avoiding these instances is key to reliable usage.
Stay Engaged with the Community: Collaborate with others, join forums, and stay abreast of the latest developments. The collective wisdom of the community is a powerful asset in mastering these technologies.
Be Creative: Prompt the model for creative ideas (e.g., "Give me 20 ideas on X"). The first answers might be obvious, but further down the list, you might find a spark of brilliance.

Retrieval Augmented Generation

💡 Retrieval Augmented Generation (RAG) represents an intriguing intersection between the vast capabilities of Large Language Models (LLMs) and the power of information retrieval. It’s a technique that marries the best of both worlds, offering a compelling approach to generating information and insights.

Here’s how it works and why it’s making waves in the tech community:

What is Retrieval Augmented Generation?

RAG is a method that, instead of directly training a model on specific data or documents, leverages the vast information already available on the internet. By searching for relevant content, it pulls this information together and uses it as a foundation for asking an LLM to generate an answer.

**Figure**: Example of a simple RAG procedure pasting Wikipedia data into the context of a ChatGPT LLM prompt to extract useful information.

How Does RAG Work?

Search for Information: First, a search is conducted for content relevant to the query or task at hand. This could involve scouring databases, the web, or specialized repositories.
Prepend the Retrieved Data: The content found is then prepended to the original query or prompt. Essentially, it’s added to the beginning of the question or task you’re posing to the LLM.
Ask the Model to Answer: With this combined prompt, the LLM is then asked to generate an answer or complete the task. The prepended information guides the model’s response, grounding it in the specific content retrieved.

Why is RAG Valuable?

Customization: It allows for tailored responses based on real-world data, not just the general patterns an LLM has learned from its training corpus.
Efficiency: Rather than training a specialized model, which can be costly and time-consuming, RAG leverages existing models and augments them with relevant information.
Flexibility: It can be applied to various domains, from coding to medical inquiries, by merely adapting the retrieval component to the area of interest.
Quality: By guiding the model with actual content related to the query, it often results in more precise and contextually accurate responses.

Retrieval Augmented Generation represents an elegant solution to some of the challenges in working with LLMs. It acknowledges that no model, no matter how large, can encapsulate the entirety of human knowledge. By dynamically integrating real-time information retrieval, RAG opens new horizons for LLMs, making them even more versatile and responsive to specific and nuanced inquiries.

In a world awash with information, the fusion of search and generation through RAG offers a sophisticated tool for navigating and extracting value. Here’s my simple formula for RAG:

USEFULNESS ~ LLM_CAPABILITY * CONTEXT_DATA

or more simply: 👇
USEFULNESS ~ Intelligence * Information

Let’s examine an advanced and extremely powerful technique to provide helpful context to LLMs and, thereby, get the most out of it: 👇

Embeddings and Vector Search: A Special Case of Retrieval Augmented Generation (RAG)

In the broader context of RAG, a specialized technique called “Embeddings and Vector Search” takes text-based exploration to a new level, allowing for the construction of semantic search engines that leverage the capabilities of LLMs.

Here’s how it works:

Transforming Text into Embeddings

Text to Vector Conversion: Any string of text, be it a sentence, paragraph, or document, can be transformed into an array of floating-point numbers, or an “embedding”. This embedding encapsulates the semantic meaning of the text based on the LLM’s mathematical model of human language.
Dimensionality: These embeddings are positioned in a high-dimensional space, e.g., 1,536 dimensions. Each dimension represents a specific aspect of the text’s semantic content, allowing for a nuanced representation.

Example: Building a Semantic Search Engine

Cosine Similarity Distance: To find the closest matches to a given query, the cosine similarity distance between vectors is calculated. This metric measures how closely the semantic meanings align between the query and the existing embeddings.
Combining the Brain (LLM) with Application Data (Embedding): By pairing the vast understanding of language embedded in LLMs with specific application data through embeddings, you create a bridge between generalized knowledge and specific contexts.
Retrieval and Augmentation: The closest matching embeddings are retrieved, and the corresponding text data is prepended to the original query. This process guides the LLM’s response, just as in standard RAG.

Why is this Technique Important?

You can use embeddings as input to LLM prompts to provide context in a highly condensed and efficient form. This solves one half of the problem of using LLMs effectively!

Precision: It offers a finely-tuned mechanism for retrieving content that semantically resonates with a given query.
Scalability: The method can be applied to vast collections of text, enabling large-scale semantic search engines.
Customization: By building embeddings from specific data sources, the search process can be tailored to the unique needs and contexts of different applications.

💡 Embeddings are a powerful extension of the RAG paradigm, enabling a deep, semantic understanding of text. By translating text into numerical vectors and leveraging cosine similarity, this technique builds bridges between the abstract mathematical understanding of language within LLMs and the real-world applications that demand precise, context-aware responses.

Using embeddings in OpenAI is as simple as running the following code:

response = openai.Embedding.create(
    input="Your text string goes here",
    model="text-embedding-ada-002"
)
embeddings = response['data'][0]['embedding']

Possible output:

{
  "data": [
    {
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

If you want to dive deeper into embeddings, I recommend checking out our blog post and the detailed OpenAI guide!

💡 Recommended: What Are Embeddings in OpenAI?

ChatGPT Plugins

OpenAI has recently announced the initial support for plugins in ChatGPT. As part of the gradual rollout of these tools, the intention is to augment language models with capabilities that extend far beyond their existing functionalities.

💡 ChatGPT plugins are tools specifically designed for language models to access up-to-date information, run computations, or use third-party services such as Expedia, Instacart, Shopify, Slack, Wolfram, and more.

The implementation of plugins opens up a vast range of possible use cases. From giving parents superpowers with Milo Family AI to enabling restaurant bookings through OpenTable, the potential applications are expansive. Examples like searching for flights with KAYAK or ordering groceries from local stores via Instacart highlight the practical and innovative utilization of these plugins.

OpenAI is also hosting two plugins, a web browser and a code interpreter (see below) to broaden the model’s reach and increase its functionality. An experimental browsing model will allow ChatGPT to access recent information from the internet, further expanding the content it can discuss with users.

ChatGPT Code Interpreter: What Is It and How Does It Work?

The ChatGPT Code Interpreter is a revolutionary feature added to OpenAI’s GPT-4 model, enabling users to execute Python code within the ChatGPT environment.

It functions as a sandboxed Python environment where tasks ranging from PDF conversion using OCR to video trimming and mathematical problem-solving can be carried out.

Users can upload local files in various formats, including TXT, PDF, JPEG, and more, as the Code Interpreter offers temporary disk space and supports over 300 preinstalled Python packages.

Whether it’s data analysis, visualization, or simple file manipulations, the Code Interpreter facilitates these actions within a secure, firewalled environment, transforming the chatbot into a versatile computing interface.

Accessible to ChatGPT Plus subscribers, this feature amplifies the range of possibilities for both coders and general users, blending natural language interaction with direct code execution.

Here’s a list of tasks that can be solved by Code Interpreter that were previously solved by specialized data scientists:

Explore Your Data: You can upload various data files and look into them. It’s a handy way to see what’s going on with your numbers.
Clean Up Your Data: If your data’s a little messy, you can tidy it up by removing duplicates or filling in missing parts.
Create Charts and Graphs: Visualize your data by making different types of charts or graphs. It’s a straightforward way to make sense of complex information.
Try Out Machine Learning: Build your own machine learning models to predict outcomes or categorize information. It’s a step into the more advanced side of data handling.
Work with Text: Analyze texts to find out what’s being said or how it’s being expressed. It’s an interesting dive into natural language processing.
Convert and Edit Files: Whether it’s PDFs, images, or videos, you can convert or modify them as needed. It’s quite a practical feature.
Gather Data from Websites: You can pull data directly from web pages, saving time on collecting information manually.
Solve Mathematical Problems: If you have mathematical equations or problems, you can solve them here. It’s like having a calculator that can handle more complex tasks.
Experiment with Algorithms: Write and test your algorithms for various purposes. It’s a useful way to develop custom solutions.
Automate Tasks: If you have repetitive or routine tasks, you can write scripts to handle them automatically.
Edit Images and Videos: Basic editing of images and videos is possible, allowing for some creative applications.
Analyze IoT Device Data: If you’re working with Internet of Things (IoT) devices, you can analyze their data in this environment.