This blog describes a fun mini Python project using AI to generate a childrenās storybook with AI-generated illustrations and using PyGame to render the story.
Context
Like over a million other users, I have spent some time this week playing with and being amazed by chatGPT.
I ran one experiment to ask it to write a story for my daughter. Itās incredible to see the story develop right in front of your eyes. This got me thinking about combining the AI-generated text with AI-generated illustrations (seeĀ my recent blog about AI generated advent calendars)
This blog describes a mini project to generate a storybook written and illustrated by artificial intelligence.Ā
All code and a sample story are available on GitHub.
Configuring the Story
Rather than create and illustrate the story in real time, I split the task into two phases:
- An authoring and illustrating phase to generate and cache the storybook and illustrations (the
author.py
script) andĀ - A simple PyGame storybook app (
main.py
) to render the story for the reader.Ā
Both phases make use of a core config.py
script.
config.py
import os import re from dotenv import load_dotenv load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") PAGES = 10 WORDSPERPAGE=10 STORY = "The best Christmas ever!" IMAGESTYLE = "Impressionist Painting" CHAR1 = 'Ginger' CHAR2 = 'Momo' DESCRIBE1 = 'a ginger kitten' DESCRIBE2 = 'a black kitten' FORMAT = "For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with A "+ IMAGESTYLE +" of ... " FULLREQUEST = 'Write a '+ str(PAGES) + ' page children\'s story with no more than '+str(WORDSPERPAGE)+' words per page about ' + CHAR1 + ' who is a '+ DESCRIBE1 + \ ' and ' + CHAR2 + ' who is a '+DESCRIBE2 + ". The title of the book is " + STORY + '. ' + FORMAT def parsestory(): with open('story.txt','r') as f: story=f.read() pages=[] regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)" matches = re.findall(regex, story) for m in matches: illustration=m[9] illustration=illustration.replace(CHAR1,DESCRIBE1) illustration=illustration.replace(CHAR2,DESCRIBE2) pages.append({'page':m[1],'text': m[5],'image':illustration}) return pages
Firstly, to use open AI, you will need to register and obtain your own API key at https://beta.openai.com/signup. I recommend you create a .env
file and define OPEN_API_KEY
within:
The storybook is configured to create:
- A 10-page story with about 10 words per page
- A story title/theme of āThe best Christmas everāĀ
- Two main characters; a ginger kitten called āGingerā and a black kitten called āMomoāĀ
- The illustrations will be in the impressionist painting style.
PAGES = 10 WORDSPERPAGE=10 STORY = "The best Christmas ever!" IMAGESTYLE = "Impressionist Painting" CHAR1 = 'Ginger' CHAR2 = 'Momo' DESCRIBE1 = 'a ginger kitten' DESCRIBE2 = 'a black kitten'
The FULLREQUEST
to send to ChatGPTās da-vinci-3
engine is completed using the configuration settings below.
Note:
- we ask the AI engine to write both the text and to suggest an illustration for each page.
- the explicit instructions for formatting the output such that it can be subsequently parsed by a Python script
FORMAT = "For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with "+ IMAGESTYLE +" of ... " FULLREQUEST = 'Write a '+ str(PAGES) + ' page children\'s story with no more than '+str(WORDSPERPAGE)+' words per page about ' + CHAR1 + ' who is '+ DESCRIBE1 + \ ' and ' + CHAR2 + ' who is '+DESCRIBE2 + ". The title of the book is " + STORY + '. ' + FORMAT
A FULLREQUEST
to the openai
engine would look something like
Write a 10 page children’s story with no more than 10 words per page about Ginger who is a ginger kitten and Momo who is a black kitten. The title of the book is Christmas Time. For each page write PAGE then page number, then write TEXT and the page text then IMAGE followed by a description of an image to illustrate the page. Start each image description with Impressionist Painting of ā¦
An example extract of a response would be:
Parsing the generated story (which gets cached to story.txt
) into an array of dictionary items is performed by a routine called parsestory
. Each dictionary item stores the page number, the page text and the illustration description text.
def parsestory(): with open('story.txt','r') as f: story=f.read() pages=[] regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)" matches = re.findall(regex, story) for m in matches: illustration=m[9] illustration=illustration.replace(CHAR1,DESCRIBE1) illustration=illustration.replace(CHAR2,DESCRIBE2) pages.append({'page':m[1],'text': m[5],'image':illustration})
The hard work here is done using re.fi
ndall and a very carefully constructed regex that matches the format requested for the story. Iām not a regex guru, so I always use https://regex101.com/r/phCIEr/1/ to develop and test a complex regex pattern.
regex = r"(PAGE) (\d+)([\s\S]*?)(TEXT)([\s\S]*?)(\w.+)([\s\S]*?)(IMAGE)([\s\S]*?)(\w.+)"
Basically, this looks for literals PAGE
, TEXT
, and IMAGE
and returns the matching elements between them.
A typical illustration suggestion returned by the AI engine will be something like
Impressionist Painting of Ginger and Momo playing with a ball of yarn.
But the AI illustrator has little chance of drawing Ginger or Momo or even guessing they are kittens so the names are switched with their defined descriptions with replace
before sending them to the AI illustration engine.
So the above example is transformed to
Impressionist Painting of a ginger kitten and a black kitten playing with a ball of yarn.
The Authoring Script
The aiwrite
function takes the FULLREQUEST
string parameter and calls the openai.completion.create
method.
Be sure to add a large value for max_tokens
otherwise, your result text is likely to get truncated. The attribute max_tokens
is omitted from the sample code on the openai
documentation.
The aipaint
uses openai.image.create
with the given image description and returns a URL for the created image.
author.py
import config import urllib.request import openai from os.path import exists from config import parsestory def aiwrite(story): openai.api_key = config.OPENAI_API_KEY response = openai.Completion.create( model="text-davinci-003", prompt=story, temperature=0.8, max_tokens=800 ) return response def aipaint(description): openai.api_key = config.OPENAI_API_KEY print(description) response = openai.Image.create( prompt=description, n=1, size="512x512" ) image_url = response['data'][0]['url'] return image_url if exists('story.txt'): print("Using cached story") else: # write a new text print("Sending Request:" + config.FULLREQUEST) response=aiwrite(config.FULLREQUEST) print("Saving Response:" +response.choices[0].text) with open('story.txt', 'w') as f: f.write(response.choices[0].text) story=parsestory() for page in story: pagenumber=int(page['page']) filename='images/output'+ (f"{pagenumber:02d}") +".jpg" if(exists(filename)): print(filename+" already exists") else: print("Painting...") try: print("Page "+str(pagenumber)) print(page['text']) painting=aipaint(page['image']) print("Storing as "+filename) urllib.request.urlretrieve(painting, filename) except Exception as err: print("Paint "+filename+" failed") print(err) print("Illustrations complete and paint has dried!") print("Now run main.py to read " + config.STORY)
Both the story and each page image are cached as local files. If you want a new story or new images then just delete the locally cached files and rerun the author.py
script.
I frequently found one or more images were not that great. For example, I would often get two black kittens. In these cases, I would delete the poor images files and rerun the author.py
script to recreate the missing images only.Ā
The Story AppĀ
The main.py
script is a simple PyGame script which parses and then renders the story one page at a time adding a title page too as page 0.
main.py
import pygame import sys import time import config from config import parsestory import textwrap # Colors BLACK = (0, 0, 0) GRAY = (180, 180, 180) WHITE = (255, 255, 255) HEIGHT = 500 WIDTH = HEIGHT LINEHEIGHT = 30 TEXTHEIGHT = 200 # Create game pygame.init() size = width, height = WIDTH, HEIGHT+TEXTHEIGHT screen = pygame.display.set_mode(size) pygame.display.set_caption(config.STORY) font_name = pygame.font.get_default_font() font = pygame.font.SysFont('Arial', 20) bigfont = pygame.font.SysFont('Arial', 30) hugefont = pygame.font.SysFont('Arial', 50) def addText(text, position, color, font): giftText = font.render(text, True, color) giftRect = giftText.get_rect() giftRect.center = position screen.blit(giftText, giftRect) pageopen=0 story=parsestory() while True: # Check if game quit for event in pygame.event.get(): if event.type == pygame.QUIT: sys.exit() if pageopen==0: # use last image as title page image filename="images/output"+(f"{config.PAGES:02d}")+".jpg" else: filename='images/output'+ (f"{pageopen:02d}") +".jpg" # render the image image = pygame.image.load(filename) rect = image.get_rect() screen.blit(image, rect) # render the story text area s = pygame.Surface((width,TEXTHEIGHT)) s.fill(WHITE) screen.blit(s, (0,HEIGHT)) # pageopen is 0 then render a title page - otherwise render the page if(pageopen==0): addText(config.STORY, ((width / 2), (HEIGHT+(2*LINEHEIGHT))), BLACK, hugefont) else: text=story[pageopen-1]['text'] lines = textwrap.wrap(text, 40, break_long_words=True) h=HEIGHT+(2*LINEHEIGHT) for line in lines: addText(line, ((width / 2), h), BLACK, bigfont) h+=LINEHEIGHT addText('page '+str(pageopen), ((width / 2),HEIGHT+TEXTHEIGHT-LINEHEIGHT), GRAY, font) # Check if play button clicked then loop round to next page click, _, _ = pygame.mouse.get_pressed() if click == 1: pageopen =(pageopen+1)%(config.PAGES+1) time.sleep(0.2) pygame.display.flip()
The world premiere of Momo and Ginger in āThe best Christmas everā.
A Review of the Storybook
š The storybook actually passed limited end-user testing with flying colours; my five-year-old was captivated by the story of her kittens.
This was not the first draft – I tested many many stories and settings.
Generally, the AI-generated stories were good, though with more text allowed on each page, they sometimes became repetitive. For example, one prototype told me the kittens were excited on over half the pages.
In a story called āThe year Santa was lateā I was particularly impressed with the nuance in the constructed sentence āGinger and Momo decided to take matters into their own paws.ā
The biggest challenge was the illustrations. In real life Ginger and Momo are ginger and white and black and white – I reconfigured them to be mono-coloured because the AI illustrations could not be consistent with where the white patches were in the ten different images.
I also chose the impressionist style of illustration as it was the most forgiving in drawing the cats semi-consistently.
I ran experiments using a ācartoonā illustration style – but got wildly different renderings of Ginger and Momo from page to page. Even selecting a more specific āEric Carle-style illustrationā yielded inconsistent results.
Illustrations would be better if we could supply image files of the main characters and the image description text and have the AI engine combine the two. Maybe such as engine already exists?
Using a āphotographā style for the illustrations could generate some very nice images until the description involved action – then they were terrible! Iām guessing the AI engine has access to many images of cats sitting but has to try and self-draw a cat performing a specific action.
I must confess Page 10 was hand edited as chatGPT had a nasty habit of lazily writing just āThe endā for page 10. A trick I used way back in my own schooldays and indeed havenāt grown out of.
The end!