Witness the Magic of Python AI-Written and Illustrated Storybooks for Children!

This blog describes a fun mini Python project using AI to generate a childrenā€™s storybook with AI-generated illustrations and using PyGame to render the story.

Context

Like over a million other users, I have spent some time this week playing with and being amazed by chatGPT.

I ran one experiment to ask it to write a story for my daughter. Itā€™s incredible to see the story develop right in front of your eyes. This got me thinking about combining the AI-generated text with AI-generated illustrations (seeĀ  my recent blog about AI generated advent calendars)

This blog describes a mini project to generate a storybook written and illustrated by artificial intelligence.Ā 

All code and a sample story are available on GitHub.

Configuring the Story

Rather than create and illustrate the story in real time, I split the task into two phases:

  • An authoring and illustrating phase to generate and cache the storybook and illustrations (the author.py script) andĀ 
  • A simple PyGame storybook app (main.py) to render the story for the reader.Ā 

Both phases make use of a core config.py script.

config.py

import os
import re
from dotenv import load_dotenv
 
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 
PAGES = 10
WORDSPERPAGE=10
STORY = "The best Christmas ever!"
IMAGESTYLE = "Impressionist Painting"
 
CHAR1 = 'Ginger'
CHAR2 = 'Momo'
 
DESCRIBE1 = 'a ginger kitten'
DESCRIBE2 = 'a black kitten'
 
FORMAT = "For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with A "+ IMAGESTYLE +" of ... "
 
FULLREQUEST = 'Write a '+ str(PAGES) + ' page children\'s story with no more than '+str(WORDSPERPAGE)+' words per page about ' + CHAR1 + ' who is a '+ DESCRIBE1 + \
    ' and ' + CHAR2 + ' who is a '+DESCRIBE2 + ". The title of the book is " + STORY + '. ' + FORMAT
 
 
def parsestory():
    with open('story.txt','r') as f:
        story=f.read()
 
    pages=[]
    regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)"
    matches = re.findall(regex, story)
   
    for m in matches:
        illustration=m[9]
        illustration=illustration.replace(CHAR1,DESCRIBE1)
        illustration=illustration.replace(CHAR2,DESCRIBE2)
        pages.append({'page':m[1],'text': m[5],'image':illustration})
   
    return pages

Firstly, to use open AI, you will need to register and obtain your own API key at https://beta.openai.com/signup. I recommend you create a .env file and define OPEN_API_KEY within:

The storybook is configured to create:

  • A 10-page story with about 10 words per page 
  • A story title/theme of ā€œThe best Christmas everā€Ā 
  • Two main characters; a ginger kitten called ā€˜Gingerā€™ and a black kitten called ā€˜Momoā€™Ā 
  • The illustrations will be in the impressionist painting style.
PAGES = 10
WORDSPERPAGE=10
STORY = "The best Christmas ever!"
IMAGESTYLE = "Impressionist Painting"
 
CHAR1 = 'Ginger'
CHAR2 = 'Momo'
 
DESCRIBE1 = 'a ginger kitten'
DESCRIBE2 = 'a black kitten'

The FULLREQUEST to send to ChatGPTā€™s da-vinci-3 engine is completed using the configuration settings below.

Note:

  • we ask the AI engine to write both the text and to suggest an illustration for each page.
  • the explicit instructions for formatting the output such that it can be subsequently parsed by a Python script 
FORMAT = "For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with "+ IMAGESTYLE +" of ... "
 
FULLREQUEST = 'Write a '+ str(PAGES) + ' page children\'s story with no more than '+str(WORDSPERPAGE)+' words per page about ' + CHAR1 + ' who is '+ DESCRIBE1 + \
    ' and ' + CHAR2 + ' who is '+DESCRIBE2 + ". The title of the book is " + STORY + '. ' + FORMAT

A FULLREQUEST to the openai engine would look something like

Write a 10 page children’s story with no more than 10 words per page about Ginger who is a ginger kitten and Momo who is a black kitten. The title of the book is Christmas Time. For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with  Impressionist Painting of ā€¦

An example extract of a response would be:

Parsing the generated story (which gets cached to story.txt) into an array of dictionary items is performed by a routine called parsestory. Each dictionary item stores the page number, the page text and the illustration description text.

def parsestory():
    with open('story.txt','r') as f:
        story=f.read()
 
    pages=[]
    regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)"
    matches = re.findall(regex, story)
   
    for m in matches:
        illustration=m[9]
        illustration=illustration.replace(CHAR1,DESCRIBE1)
        illustration=illustration.replace(CHAR2,DESCRIBE2)
        pages.append({'page':m[1],'text': m[5],'image':illustration})

The hard work here is done using re.findall and a very carefully constructed regex that matches the format requested for the story. Iā€™m not a regex guru, so I always use https://regex101.com/r/phCIEr/1/ to develop and test a complex regex pattern.

regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)"

Basically, this looks for literals PAGE, TEXT, and IMAGE and returns the matching elements between them.

A typical illustration suggestion returned by the AI engine will be something like

Impressionist Painting of Ginger and Momo playing with a ball of yarn.

But the AI illustrator has little chance of drawing Ginger or Momo or even guessing they are kittens so the names are switched with their defined descriptions with replace before sending them to the AI illustration engine.

So the above example is transformed to

Impressionist Painting of a ginger kitten and a black kitten playing with a ball of yarn.

The Authoring Script

The aiwrite function takes the FULLREQUEST string parameter and calls the openai.completion.create method.

Be sure to add a large value for max_tokens otherwise, your result text is likely to get truncated. The attribute max_tokens is omitted from the sample code on the openai documentation.

The aipaint uses openai.image.create with the given image description and returns a URL for the created image.

author.py

import config
import urllib.request
import openai
from os.path import exists
from config import parsestory
 
def aiwrite(story):
    openai.api_key = config.OPENAI_API_KEY
    response = openai.Completion.create(
            model="text-davinci-003",
            prompt=story,
            temperature=0.8,
            max_tokens=800
        )
    return response
 
def aipaint(description):
    openai.api_key = config.OPENAI_API_KEY
    print(description)
    response = openai.Image.create(
                    prompt=description,
                    n=1,
                    size="512x512"
                    )
    image_url = response['data'][0]['url']
    return image_url
 
if exists('story.txt'):
    print("Using cached story")
else:
    # write a new text
    print("Sending Request:" + config.FULLREQUEST)
    response=aiwrite(config.FULLREQUEST)
    print("Saving Response:" +response.choices[0].text)
    with open('story.txt', 'w') as f:
        f.write(response.choices[0].text)
 
story=parsestory()
 
for page in story:
    pagenumber=int(page['page'])
    filename='images/output'+ (f"{pagenumber:02d}") +".jpg"
 
    if(exists(filename)):
        print(filename+" already exists")
    else:
        print("Painting...")
        try:
            print("Page "+str(pagenumber))
            print(page['text'])
            painting=aipaint(page['image'])
            print("Storing as "+filename)
            urllib.request.urlretrieve(painting, filename)
        except Exception as err:
            print("Paint "+filename+" failed")
            print(err)
 
print("Illustrations complete and paint has dried!")
print("Now run main.py to read " + config.STORY)

Both the story and each page image are cached as local files. If you want a new story or new images then just delete the locally cached files and rerun the author.py script.

I frequently found one or more images were not that great. For example, I would often get two black kittens. In these cases, I would delete the poor images files and rerun the author.py script to recreate the missing images only.Ā 

The Story AppĀ 

The main.py script is a simple PyGame script which parses and then renders the story one page at a time adding a title page too as page 0.

main.py

import pygame
import sys
import time
import config
from config import parsestory
import textwrap
 
# Colors
BLACK = (0, 0, 0)
GRAY = (180, 180, 180)
WHITE = (255, 255, 255)
 
HEIGHT = 500
WIDTH = HEIGHT
LINEHEIGHT = 30
TEXTHEIGHT = 200
 
# Create game
pygame.init()
size = width, height = WIDTH, HEIGHT+TEXTHEIGHT
screen = pygame.display.set_mode(size)
pygame.display.set_caption(config.STORY)
 
font_name = pygame.font.get_default_font()
font = pygame.font.SysFont('Arial', 20)
bigfont = pygame.font.SysFont('Arial', 30)
hugefont = pygame.font.SysFont('Arial', 50)
 
def addText(text, position, color, font):
    giftText = font.render(text, True, color)
    giftRect = giftText.get_rect()
    giftRect.center = position
    screen.blit(giftText, giftRect)
 
pageopen=0
story=parsestory()
 
while True:
 
    # Check if game quit
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            sys.exit()
     
    if pageopen==0:
        # use last image as title page image
        filename="images/output"+(f"{config.PAGES:02d}")+".jpg"    
    else:
        filename='images/output'+ (f"{pageopen:02d}") +".jpg"
 
    # render the image    
    image = pygame.image.load(filename)
    rect = image.get_rect()
    screen.blit(image, rect)
 
    # render the story text area
    s = pygame.Surface((width,TEXTHEIGHT))            
    s.fill(WHITE)
    screen.blit(s, (0,HEIGHT))
 
    # pageopen is 0 then render a title page - otherwise render the page
    if(pageopen==0):
        addText(config.STORY, ((width / 2), (HEIGHT+(2*LINEHEIGHT))), BLACK, hugefont)  
    else:
        text=story[pageopen-1]['text']
        lines = textwrap.wrap(text, 40, break_long_words=True)
        h=HEIGHT+(2*LINEHEIGHT)
        for line in lines:
            addText(line, ((width / 2), h), BLACK, bigfont)
            h+=LINEHEIGHT
        addText('page '+str(pageopen), ((width / 2),HEIGHT+TEXTHEIGHT-LINEHEIGHT), GRAY, font)
       
    # Check if play button clicked then loop round to next page
    click, _, _ = pygame.mouse.get_pressed()
    if click == 1:  
        pageopen =(pageopen+1)%(config.PAGES+1)
        time.sleep(0.2)
 
    pygame.display.flip()

The world premiere of Momo and Ginger in ā€˜The best Christmas everā€™.

A Review of the Storybook

šŸ‘ The storybook actually passed limited end-user testing with flying colours; my five-year-old was captivated by the story of her kittens.

This was not the first draft – I tested many many stories and settings. 

Generally, the AI-generated stories were good, though with more text allowed on each page, they sometimes became repetitive. For example, one prototype told me the kittens were excited on over half the pages.

In a story called ā€˜The year Santa was lateā€™ I was particularly impressed with the nuance in the constructed sentence ā€˜Ginger and Momo decided to take matters into their own paws.ā€™

The biggest challenge was the illustrations. In real life Ginger and Momo are ginger and white and black and white – I reconfigured them to be mono-coloured because the AI illustrations could not be consistent with where the white patches were in the ten different images.

The real Momo and Ginger

I also chose the impressionist style of illustration as it was the most forgiving in drawing the cats semi-consistently.

I ran experiments using a ā€˜cartoonā€™ illustration style – but got wildly different renderings of Ginger and Momo from page to page. Even selecting a more specific ā€˜Eric Carle-style illustrationā€™ yielded inconsistent results.

Illustrations would be better if we could supply image files of the main characters and the image description text and have the AI engine combine the two. Maybe such as engine already exists?

Using a ā€˜photographā€™ style for the illustrations could generate some very nice images until the description involved action – then they were terrible! Iā€™m guessing the AI engine has access to many images of cats sitting but has to try and self-draw a cat performing a specific action.

I must confess Page 10 was hand edited as chatGPT had a nasty habit of lazily writing just ā€˜The endā€™ for page 10. A trick I used way back in my own schooldays and indeed havenā€™t grown out of.

The end!