Exploring Role-Play Prompting LLMs - What Does Science Say?

Large Language Models (LLMs) can sometimes feel like alien technology. In other words, we don’t fully understand how they work or what they’re truly capable of. It’s akin to stone-age people trying to use advanced alien technology, occasionally causing it to spark, leaving us in awe of its magical capabilities.

Through billions of trials and errors, we’ve learned that some prompts are significantly more effective than others. This insight has led to the emergence of a new discipline known as prompt engineering.

In today’s article, we’ll examine one of those prompting techniques used by expert prompt engineers to improve performance: the role playing prompt.

🔨 Problem Formulation: Does the role-playing prompt really work? What evidence is there to suggest it does?

Research Paper on Better Zero-Shot Reasoning with Role-Play Prompting

In the paper “Better Zero-Shot Reasoning with Role-Play Prompting”, a group of Lenovo researchers found out that

“Leveraging models such as ChatGPT and Llama 2, our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.”

The study authors proposed a two-staged design where they first collected a number of “role feedback prompts”, i.e., responses from the AI model to acknowledge the role it is prompted to play.

Then they chose the best “role feedback” response and used it for evaluation like so:

Exploration: Find the best role-feedback response.
Exploitation: Use the best role-feedback response as a wrapper to set the role-playing context.

The result was evaluated on 12 different datasets spanning four categories:

Arithmetic
Common Sense
Symbolic
Other Reasoning

Here are the four different role setting and role feedback prompts — you can use them as an inspiration for your own prompt engineering tasks to obtain better results:

They also compared the role-playing approach with “Chain-of-Thought (CoT)” prompting trick asking the model to “Think Step by Step”, which is a common approach to improve performance.

The results are interesting:

So, the role-playing prompt improves performance against zero-shot and often against CoT prompting. You’d be better off using it all the time!

Yet, the performance difference is not so huge that it warrants the effort for all tasks.

So, if you need something done quickly, you’d probably not have the time to set the whole stage with many words and optimized Role-Feedback Prompts.

The Elephant in the Room

Papers like this, however, miss the forest for the trees. Role-playing prompts are not only x% better than non-role-playing prompts. They’re infinitely better in some application scenarios.

For example, my Mastermind Prompt wouldn’t be possible without role-playing:

So role-playing prompts open up a whole new dimension or category of prompts. An interesting role-playing prompt is to tell the model to play the role of a Linux operating system, for example:

So, we have one computer system that is capable of simulating every other system whether it’s a human being, another computer system or OS, or a phantasy character like in CharacterAI: