You can extract text from images with EasyOCR, a deep learning-based OCR tool in Python. EasyOCR performs very well on invoices, handwriting, car plates, and public signs.
First released in 2007, PyTesseract [1] is the to-go library for extracting text from images. It uses classical computer vision methods to perform optical character recognition (OCR), then features neural network components such as LSTM from its fourth version onwards.
You may ask: Is there any alternative that is as good as PyTesseract for OCR? Yes, EasyOCR [2] it is. It is a new, deep learning-based module for reading text from all kinds of images in more than 80 languages.
In this article, we will go through a three-step tutorial.
- First, we will install the required libraries.
- Second, we will perform image-to-text processing using EasyOCR on various images.
- Third, we will use OpenCV to overlay detected texts on the original images. Let’s get started.
Step 1: Install and Import Required Modules
Optical character recognition is a process of reading text from images. An easy task for humans, but more work for computers to identify text from image pixels. For this tutorial, we will need OpenCV, Matplotlib, Numpy, PyTorch, and EasyOCR modules. Here’s the GitHub repo of this tutorial.
You can follow the tutorial in our interactive Jupyter notebook online:
First, create a virtual environment for this project. Then, install the mentioned modules in a Jupyter notebook:
!pip install opencv-python !pip install matplotlib !pip install numpy !pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html !pip install easyocr
The OpenCV module is for computer vision-related operations in Python. Specifically, we will use it to overlay images with respective recognized texts later. We need the Matplotlib module to display images. And we will use the Numpy module to convert images into arrays.
PyTorch is a prerequisite for the EasyOCR module. Its installation varies according to OS and GPU driver requirements. You can get the installation commands at the PyTorch homepage [3]. Copy and execute the respective command as shown in Figure 1 if you operate on Windows.
Now, go ahead and install the EasyOCR module — the tool we need for extracting text from images. At this point, you should be able to execute the following lines of code in your notebook:
import cv2 import numpy as np import easyocr import matplotlib.pyplot as plt %matplotlib inline
Note that the %matplotlib inline
magic command is exclusive for Jupyter notebooks. It is not required in a Python script. It sets the backend of the Matplotlib module to display figures inline and not on a separate window.
You’re off to a great start! Now, onto the next step.
Step 2: Load Images and Extract Text using EasyOCR
For copyright reasons, all images used in the sample notebook are not provided in the GitHub repo. Feel free to download them from Unsplash.com or use your images. Define the path of an image using the following code:
im_1_path = './folder/image_name.jpg'
Next, initialize an EasyOCR reader with a list of languages you would like to use. Use the reader to read an image with the following function:
def recognize_text(img_path): '''loads an image and recognizes text.''' reader = easyocr.Reader(['en']) return reader.readtext(img_path)
Did it surprise you that two lines of code are all you need to perform OCR? “Easy” for EasyOCR! The recognize_text() function initializes an OCR reader to a variable named reader. It takes a list of languages as a parameter. For this tutorial, we want to only recognize English text, thus the ‘en’ in the list. The readtext method reads an image given its stored directory. The returned OCR result is passed as the output of the recognize_text() function.
result = recognize_text(im_1_path) result
Note that it is going to take more time to execute EasyOCR on a CPU instead of a GPU. The im_1_path image took around ten seconds to be executed by recognize_text(). Figure 2 shows the operations in the EasyOCR framework. The framework includes image preprocessing, deep learning model recognition, and image postprocessing.
Here is the output of the EasyOCR module:
[([[1421, 1139], [1453, 1139], [1453, 1177], [1421, 1177]], 'S', 0.8625819477165351), ([[1524, 1038], [2201, 1038], [2201, 1211], [1524, 1211]], 'CCC444', 0.9068348515895301), ([[1641, 1201], [2012, 1201], [2012, 1245], [1641, 1245]], 'T E S L A.C O M', 0.33458756243407134), ([[2519, 1254], [2790, 1254], [2790, 1284], [2519, 1284]], 'DUAL MSTOF', 0.24584700695087508)]
It returns a list of detected text, with each text element containing three types of information. Which are: the text, its bounding box vertices, and the confidence level of the text detection. From the output, EasyOCR detected four text elements: ‘S’, ‘CCC444’, ‘T E S L A.C O M’, and ‘DUAL MSTOF’.
To check the accuracy of the OCR, we need to display the original image on our notebook:
img_1 = cv2.imread(im_1_path) img_1 = cv2.cvtColor(img_1, cv2.COLOR_BGR2RGB) plt.imshow(img_1)
The imread method of the OpenCV module loads an image as a Numpy array, which is assigned to the img_1 variable. The default color channels of OpenCV are (Blue, Green, Red) instead of (Red, Green, Blue). That is why we use the cvtColor method for channel conversion. Otherwise, we will see the image with its blue color assumed as red and vice versa. The image is shown in Figure 3, which is a car with a rear view of its vehicle registration plate.
Comparing the image with its OCR output, the car plate is captured accurately. EasyOCR detects the country code and the car provider name. Yet, the ‘DUAL MOTOR text on the right side of the car is detected as ‘DUAL MSTOF’. For that, image pre-processing techniques can be used to increase the OCR accuracy. But for now, we will only test the performance of EasyOCR out of the box.
Step 3: Overlay Recognized Text on Images using OpenCV
Now, we want to draw a rectangle around each recognized text element on its original image. The overlay_ocr_text() function will be explained task-by-task.
def overlay_ocr_text(img_path, save_name): '''loads an image, recognizes text, and overlays the text on the image.''' # loads image img = cv2.imread(img_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) dpi = 80 fig_width, fig_height = int(img.shape[0]/dpi), int(img.shape[1]/dpi) plt.figure() f, axarr = plt.subplots(1,2, figsize=(fig_width, fig_height)) axarr[0].imshow(img)
First, we use the OpenCV module to load an image as a Numpy array and correct its color channels. The array is assigned to the variable img. We want to display two images — the original image, and the original image with recognized texts. The subplots method of Matplotlib is used to display more than one figure at a time. The imshow method of the axarr[0] variable displays the original image.
# recognize text result = recognize_text(img_path) # if OCR prob is over 0.5, overlay bounding box and text for (bbox, text, prob) in result: if prob >= 0.5: # display print(f'Detected text: {text} (Probability: {prob:.2f})') # get top-left and bottom-right bbox vertices (top_left, top_right, bottom_right, bottom_left) = bbox top_left = (int(top_left[0]), int(top_left[1])) bottom_right = (int(bottom_right[0]), int(bottom_right[1])) # create a rectangle for bbox display cv2.rectangle(img=img, pt1=top_left, pt2=bottom_right, color=(255, 0, 0), thickness=10) # put recognized text cv2.putText(img=img, text=text, org=(top_left[0], top_left[1] - 10), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 0, 0), thickness=8)
The recognize_text() function returns the OCR output and assigns it to the result variable. A for loop is created to go through each text element contained in the variable. Recognized text elements are displayed only if their OCR confidence levels are higher than 0.5 (prob >= 0.5). Then, the top left and bottom right vertices of each bounding box are obtained. They are converted into tuples of integer values (as required by OpenCV).
The rectangle method creates a green bounding box for each detected text element. The putText method displays recognized text above its respective bounding box. As all these are done in a for loop, the operation repeats for every recognized text in the result variable.
# show and save image axarr[1].imshow(img) plt.savefig(f'./output/{save_name}_overlay.jpg', bbox_inches='tight')
Finally, the overlay_ocr_text() function displays every created text and bounding box. The imshow method of the axarr[1] variable displays the final image. As both the left and right images are in the same subplot, they are displayed as one final image. The savefig method stores the final image to a defined local directory.
How Well Did EasyOCR Perform?
The figures below show how well EasyOCR performs for different kinds of images. We will test the library on handwriting, digits, an electronic invoice, and a public sign. For a complete overview, please refer to the demo notebook in the given GitHub repo.
EasyOCR detects most of the text in Figure 7 correctly, except the text on the right-hand side.
EasyOCR manages to detect every text in Figure 5. But the text sequence is not entirely correct.
EasyOCR detects everything in Figure 6 correctly. It is a relatively large image with clear printed digits and texts, which makes the OCR perform better.
EasyOCR manages to detect every text on the invoice accurately without image preprocessing.
Again, EasyOCR nails it for Figure 8. Every text on the figure is correctly detected.
We had the impression that EasyOCR performs greatly on images with clear text. It works fine without having to preprocess images, which saves time and cost.
Bonus: Text-to-Speech Recognition
Outputs from OCR can be further utilized with a simple text-to-speech recognition application. It converts text to a voice utterance. First, we need to install the PyTTSX3 [4] module as follows:
!pip install pyttsx3
The implementation can be done in five lines of code:
import pyttsx3 engine = pyttsx3.init() engine.setProperty('rate', 100) engine.say(sentence) engine.runAndWait()
The code initializes a TTS engine and assigns it to the variable engine. The setProperty method defines the speed of the utterance. The say method registers the text sentence to be pronounced. Finally, the runAndWait method executes the text-to-speech operation.
Conclusion
This article explains how to extract text elements from images using EasyOCR. It also shows how to overlay recognized text on images using OpenCV. A simple text-to-speech is also introduced as an extended application for OCR output.
References
[1] https://github.com/madmaze/pytesseract
[2] https://github.com/JaidedAI/EasyOCR
[3] https://pytorch.org/get-started/locally/
[4] https://pypi.org/project/pyttsx3/