Software Engineering 2.0 - We Need to Talk About LLMs

The integration of Large Language Models (LLMs) into software systems introduces a paradigm where stochastic programming becomes a focal point, challenging the deterministic norms that have long been foundational in software engineering.

Traditional software engineering approaches, such as unit testing, are predicated on the ability to predict outputs given certain inputs, ensuring that the software behaves as expected across various scenarios and edge cases. However, when LLMs are introduced into the loop, the inherent unpredictability and variability of their outputs disrupt this deterministic framework. Unit testing, for instance, struggles to validate the functionality of systems that leverage LLMs, as the same input does not guarantee the same output, making it nearly impossible to define a “correct” response in all cases.

A dummy example showing how integrating an LLM into a system may get you in trouble “at runtime”: 👇

Let’s have a look at a metaphor I read recently, it’s from Palantir’s CTO Shyam Sankar:

💫⛅ From Stars to Weather Prediction: The transition from deterministic programming to integrating Large Language Models (LLMs) can be metaphorically likened to the shift from the precise science of predicting stellar movements to the probabilistic and inherently chaotic realm of weather forecasting.

Traditional programming, akin to astronomical predictions, adheres to a deterministic model where specific inputs yield predictable and consistent outputs.

In contrast, LLMs introduce a level of variability and unpredictability, much like weather predictions, which, despite being informed by numerous variables and sophisticated models, can only offer probabilistic forecasts amidst the chaotic interplay of atmospheric conditions.

Thus, the incorporation of LLMs into software systems demands a reorientation of programming approaches, embracing strategies that can navigate and manage the inherent unpredictability, while also unlocking new potentials for more adaptive, dynamic, and nuanced user interactions and system responses.

With greater power comes greater responsibility (and unpredictability)!

Nothing But Nets – All the Way Down! Big Monolithic Unreadable Blocks of Intransparent Code

Another example. Tesla’s approach to Full Self-Driving (FSD) technology has pivoted towards utilizing an end-to-end neural network, marking a departure from many traditional software engineering principles and embracing a methodology that allows Artificial Neural Networks (ANNs) to autonomously decipher patterns and make driving decisions directly from visual inputs.

This “photons in, driving commands out” strategy bypasses the conventional engineering approach of decomposing a complex task, like autonomous driving, into smaller, manageable sub-tasks (such as lane recognition, obstacle detection, etc.) and instead entrusts the entire decision-making process to a singular, comprehensive neural network.

Nothing but nets all the way down!

Tesla’s FSD code base went from 300,000 lines of explicit code to only 3,000 lines of code. All the “code” is now encoded in a massive file of integer numbers. Andrew Karpathy calls it Software 2.0, i.e., computers that code themselves through data.

🧑‍💻 Karpathy: “To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work.

In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in.

The process of training the neural network compiles the dataset into the binary — the final neural network.

In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two: the 2.0 programmers (data labelers) edit and grow the datasets, while a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations and labeling interfaces.”

This monolithic model, which essentially translates raw visual inputs into actionable driving commands without human-understandable intermediary steps, contradicts principles like modularity and encapsulation, which advocate for systems to be broken down into discrete, independently operable units.

While this holistic approach may appear to defy logical structuring, it is precisely this freedom and the extensive depth of neural network synapses that enable ANNs to adapt, learn, and optimize their performance, identifying patterns and making connections that are often non-intuitive to human engineers.

Thus, despite the challenges and the opacity of such a model, leveraging the capabilities of ANNs and LLMs in this manner becomes imperative to harness the advanced pattern recognition and decision-making capabilities that these models offer, propelling advancements in fields like autonomous driving.

10 Principles of Stochastic Programming (“Software 2.0”)

In light of this, a shift towards embracing and effectively managing the stochastic nature of LLM-integrated systems becomes imperative.

Here are ten new principles for software engineers under the “Software 2.0” paradigm, which leans towards probabilistic and somewhat chaotic systems:

Data as Code: Software 2.0 emphasizes using data (e.g., datasets defining desirable behavior) as a form of code, where the model learns the optimal program from the data.
Computational Homogeneity: The operations in neural networks (like matrix multiplications and activation functions) are computationally homogeneous, making them more streamlined and potentially easier to optimize at the hardware level.
Agility in Performance Tuning: Software 2.0 allows for agile performance tuning, where the model can be easily scaled (in terms of complexity and computational requirements) to meet different performance criteria.
End-to-End Learning: Instead of decomposing problems into sub-problems, Software 2.0 often involves training models in an end-to-end manner, learning direct mappings from inputs to outputs.
Opacity and Inscrutability: Models, especially deep neural networks, do not readily reveal how they make decisions, making the system’s operations somewhat opaque.
Inherent Adaptability: Neural networks can adapt and optimize their internal operations based on the data they are exposed to, without explicit instruction on how to change.
Unified Model Functionality: A single model can be trained to handle multiple types of tasks or operate across various domains, leveraging shared internal representations.
Implicit Knowledge Capture: The models capture and utilize knowledge from the data implicitly, without requiring explicit programming of rules or logic.
Dynamic Failure and Success Modes: Software 2.0 can fail (or succeed) in unexpected and non-intuitive ways, given its learning-based nature and the potential for encountering unseen data.
Continuous Learning and Adaptation: Models can potentially continue to learn and adapt over time, adjusting to new data and requirements without being explicitly reprogrammed.

These principles highlight a shift from a deterministic, rule-based approach to a more dynamic, data-driven methodology, where the “code” learns to perform tasks based on patterns in the data rather than explicit instructions provided by engineers.

To navigate this complexity, employ probabilistic testing and monitoring, which accepts the inherent variability of outputs and instead focuses on validating the statistical properties and distributions of these outputs. Rather than expecting a correct answer, systems are validated based on whether the outputs fall within acceptable probabilistic bounds, ensuring that the system behaves reasonably and safely despite the unpredictability.

Moreover, developing systems that are inherently robust to a range of outputs and capable of managing unexpected or suboptimal responses from LLMs is crucial. This involves designing architectures that can gracefully handle unexpected inputs and outputs, ensuring that the system can continue to function effectively even when faced with suboptimal or erroneous responses from the LLM. This might involve implementing mechanisms that can identify and manage such responses, rerouting system functionality, or invoking alternative deterministic logic when necessary.

Additionally, the value creation in stochastic programming comes from leveraging the powerful, flexible, and often more human-like responses and interactions that LLMs can provide. While deterministic systems offer reliability and predictability, LLM-integrated systems can navigate and respond to a wider array of inputs and scenarios, providing more adaptive and dynamic interactions. This can be particularly valuable in user-facing applications, where the ability to understand and respond to a wide array of user inputs and queries in a natural and intuitive manner can significantly enhance user experience and engagement.

A solid approach for designing systems by integrating LLMs and sister technologies into them is to forget everything you’ve learned in traditional software engineering (for now) and carefully rethink each principle. Yes, some principles may still apply, but many will not!

“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” — Mark Twain 🧑‍💼