The ChatGPT Prompt Hack: Let’s Think Step by Step – Does It Really Work?

I was just reading a new research paper from OpenAI researchers proposing that the model accuracy improves if you append “Let’s think step by step” to your prompt:

Prompt with “Let’s think step by step!”

They used a two-step prompting strategy to automatically insert it so the user doesn’t recognize:

But we can do it explicitly in our own prompts to try to improve model performance and accuracy!

However, I believe that ChatGPT with GPT-4 is already prompt engineered implicitly by OpenAI researchers to include the "Let's think step by step" prompt addition because the output already is step-by-step even without explicitly adding it:

Prompt without “Let’s think step by step!”

Let’s look at two examples to see if it really makes a difference!

Example 1: Create Python Script to Guess User’s IQ

Without “Let’s Think Step by Step”

Basic prompt without “Let’s think step by step”:

Create a Python script that guesses the user's IQ based on some answers!

Here’s the output:

Creating an accurate IQ estimation script would require a comprehensive test with a variety of questions that test different aspects of intelligence. However, I can provide you with a simple Python script that estimates the user’s IQ based on their answers to a few basic questions. Keep in mind that this is just for fun and should not be taken as an accurate representation of anyone’s IQ.

import random

def iq_question_1():
    question = "Which number is the odd one out? 3, 5, 7, 11, 14, 17"
    answer = 14
    return question, answer

def iq_question_2():
    question = "What is the next number in the sequence? 1, 4, 9, 16, 25, ?"
    answer = 36
    return question, answer

def iq_question_3():
    question = "What is the missing letter in the series? a, b, d, e, g, ?"
    answer = "h"
    return question, answer

def iq_question_4():
    question = "If 2 * (3 + 4) = ?"
    answer = 14
    return question, answer

def main():
    questions = [iq_question_1, iq_question_2, iq_question_3, iq_question_4]
    random.shuffle(questions)
    correct_answers = 0

    print("Welcome to the simple IQ estimation script!")
    print("Answer the following questions to get an estimate of your IQ (for fun).\n")

    for i, question_func in enumerate(questions, start=1):
        question, answer = question_func()
        user_answer = input(f"{i}. {question}\nYour answer: ")
        
        try:
            user_answer = type(answer)(user_answer)
        except ValueError:
            print("Invalid input. Skipping the question.\n")
            continue
        
        if user_answer == answer:
            correct_answers += 1
            print("Correct!\n")
        else:
            print("Incorrect.\n")

    iq_estimate = 50 + (correct_answers / len(questions)) * 100
    print(f"Your estimated IQ based on your answers is: {iq_estimate}")

if __name__ == "__main__":
    main()

This script asks the user four simple questions, shuffles their order, and estimates the user’s IQ based on the number of correct answers. Note that this is a very simplified version and not an accurate representation of an actual IQ test.

Here’s a live demo of me solving this script (IQ = 150, I don’t think so) πŸ˜‰

With “Let’s Think Step by Step”

Basic prompt without “Let’s think step by step”:

Create a Python script that guesses the user's IQ based on some answers! Let's think step by step.

Here’s the output:

To create a simple Python script that guesses a user’s IQ based on their answers, we can follow these steps:

  1. Define a list of questions and corresponding correct answers.
  2. Present the questions to the user and collect their answers.
  3. Evaluate the user’s performance and provide an estimated IQ based on their score.

Keep in mind that this is just a fun script and will not give an accurate IQ estimate. For a proper assessment, users should take a professionally administered IQ test.

Here’s an example script:

import random

def main():
    questions = [
        ("What is the square root of 144?", "12"),
        ("What comes next in the sequence: 2, 4, 6, 8, ...?", "10"),
        ("Which planet is closest to the Sun?", "Mercury"),
        ("How many continents are there?", "7"),
        ("If you have 20 apples and give away 5, how many do you have left?", "15")
    ]
    
    random.shuffle(questions)
    num_questions = 5
    num_correct = 0

    print("Answer the following questions to get an estimated IQ score:")

    for i in range(num_questions):
        question, correct_answer = questions[i]
        user_answer = input(f"Question {i + 1}: {question}\nYour answer: ")
        
        if user_answer.lower() == correct_answer.lower():
            num_correct += 1

    estimated_iq = estimate_iq(num_correct, num_questions)
    print(f"Your estimated IQ is: {estimated_iq}")


def estimate_iq(num_correct, num_questions):
    percentage_correct = num_correct / num_questions
    estimated_iq = 100 + (percentage_correct * 100)
    return round(estimated_iq)


if __name__ == "__main__":
    main()

This script presents the user with a set of questions, collects their answers, and then provides an estimated IQ score based on the number of correct answers. Note that the estimation method is not scientifically valid and is just for fun.

Make your own mind on which of the scripts is better — however, I think the first one without the "Let's think step by step" addition worked better. The questions were better and the guessed IQ was lower (which is more realistic in my case).

Let’s try another one:

Example 2: What Is THE Answer?

Prompt without “Let’s think step by step!”
Prompt with “Let’s think step by step!”

The second answer in this case is more comprehensive but I didn’t find it objectively better. The first answer already is on point and just adding more words to the second answer doesn’t make it better.

After trying the “Let’s think step by step!” trick on many different prompts, my personal conclusion is that it doesn’t improve the answer for a normal user. The reason is, in my view, that OpenAI’s prompt engineers already prompt the model using this trick to improve model performance. We just cannot see it because it’s done implicitly by the UI. So adding it a second time won’t yield much better results. That’s my theory anyway.


If you want to improve your coding and tech skill, feel free to join my free newsletter on exponential technologies such as OpenAI and programming by downloading the cheat sheets here: