I will show you the steps I took to create a readability and grammar checker app using Streamlit. You can use it to improve your programming skills and add to your portfolio.
💡 Info: Streamlit is a popular open-source app framework among data scientists as it’s used for developing and deploying Machine Learning and Data Science web apps in minutes.
As we will see, Streamlit goes beyond turning data scripts into shareable web apps. Programmers use it to create anything within its capabilities. A quiz app, an anagram app, and a currency converter app are some of them.
Project Overview
A readability checker tool provides a quick way to assess the readability of a text and how readers can understand your work. This is especially helpful if you are writing a book or a blog and want to know where you need to work to improve readability for various audiences.
The Python ecosystem consists of third-party libraries and frameworks that support a particular application.
There’s no need to reinvent the wheel, as the heavy lifting is already done for us. Hence with a few libraries coupled with a bit of finishing touch from us, we will get our readability and grammar checker app up and running in no distant time.
Try It! You can check out my app here: click this link to view my app live on Streamlit Cloud.
Prerequisites
This tutorial assumes nothing more than a basic knowledge of Python programming, including functions, if
…else
, and for
loops.
👉 Recommended: Python Crash Course on the Finxter Blog
Although I try my best to explain the procedures, I encourage you to wrap your head around the basics because it’s not every step I have to explain. I expect you to have background knowledge already.
Importing Libraries
Before we get started, let’s import the libraries we will be using in this project.
import streamlit as st import textstat as ts from pdfminer.high_level import extract_text from pdfminer.layout import LTTextContainer from io import StringIO import docx2txt import requests from bs4 import BeautifulSoup as bs import language_tool_python
Everything above is self-explanatory. We will use textstat
to check the readability of a text. We will also use io
to extract text from a TXT document. The library anguage_tool_python
will help us check spelling and grammar. I will explain other libraries as we proceed.
Our project is a combination of several functions and callback functions we define, which are all linked together to get the job done. So, without further ado, let’s get started.
The Main Function
Our project started with what we call the main()
function which contains several options that, when selected, caused the execution of another function.
def main(): mode = st.sidebar.selectbox('Select your option', ['Text', '.pdf', '.txt', '.docx', 'Online']) # a function is called depending on the mode selected if mode == 'Text': text_result() elif mode == '.pdf': upload_pdf() elif mode == '.txt': upload_txt() elif mode == '.docx': upload_docx() else: get_url() … if __name__ == '__main__': main()
We want to give our app users the option to select what form their document is, whether they want to copy and paste into the textbox or upload an e-book, or even select from a webpage. We call Streamlit to display these options as a sidebar.
At the very last of our script, we set the __name__
variable as __main__
, which is the main()
function. This is to ensure it is running as soon as we open Streamlit, and not run when imported into another program.
👉 Recommended: Python __name__ == '__main__'
Explained
The Textbox
If our user selects ‘Text
’, the text_result()
function will execute. The function calls on Streamlit to display a textbox using st.text_area
labeled ‘Text Field
’, and the placement stored in the text variable will appear in the textbox.
def text_result(): text = 'Your text goes here...' #displaying the textbox where texts will be written box = st.text_area('Text Field', text, height=200) scan = st.button('Scan File') # if button is pressed if scan: # display statistical results st.write('Text Statistics') st.write(readability_checker(box))
The function also calls on Streamlit to insert a button which when pressed causes Streamlit to display readability results using st.write
.
The text_result()
function sends your texts in the box variable to a callback function, readability_checker()
function, and st.write()
displays the result.
def readability_checker(w): stats = dict( flesch_reading_ease=ts.flesch_reading_ease(w), flesch_kincaid_grade=ts.flesch_kincaid_grade(w), automated_readability_index=ts.automated_readability_index(w), smog_index=ts.smog_index(w), coleman_liau_index=ts.coleman_liau_index(w), dale_chall_readability_score=ts.dale_chall_readability_score(w), linsear_write_formula=ts.linsear_write_formula(w), gunning_fog=ts.gunning_fog(w), word_count=ts.lexicon_count(w), difficult_words=ts.difficult_words(w), text_standard=ts.text_standard(w), sentence_count=ts.sentence_count(w), syllable_count=ts.syllable_count(w), reading_time=ts.reading_time(w) ) return stats
So what this text_result()
does is to accept input and, when prompted, send the input to the readability_checker()
function to scan and return results in the form of a dictionary.
👉 Recommended Tutorial: Python Dictionary – Ultimate Guide
That’s all it takes to set up our readability checker app.
Had it been we had only this option in our main function, we would have called it a day. But we want to give our users more options to make a choice. But, the more features we add, the more Python scripts we need to write to execute such features.
PDF Mode
Back to our main()
function. if our users select the pdf
option, the upload_pdf()
function will execute.
def upload_pdf(): file = st.sidebar.file_uploader('Choose a file', type='pdf') if file is not None: pdf = extract_text(file) #sending the text to textbox document_result(pdf)
This function calls Streamlit to produce a file uploader to enable us to upload a PDF file. And when we upload the file, the extract_text()
function from pdfminer
does the heavy lifting for us. By default, Streamlit accepts all file extensions. By specifying the type, it allows only such.
The Setback
I wanted to make this process as seamless as possible.
What I wanted to do was to call on pdfminer
library to extract the text, and send it to the readability_checker()
which scans and produces the result that will appear using st.write()
without ever seeing the content of the file.
I wasn’t able to do so. Hence, I will appreciate anyone who can reach out to me with a solution to this problem.
A Workaround
I wasn’t deterred, though.
Since there are so many ways to kill a rat, I found a workaround with a little help from Streamlit. I benefited from Streamlit’s ability to display text as a placement in a textbox, as seen in our text_result()
function.
So, I created a function like text_result()
but with a parameter that will collect the very text extracted from the PDF file and have it displayed in the textbox.
Give me a round of applause. That’s my feat of engineering! Alright, let’s implement it.
def document_result(file): #displaying the textbox where texts will be written box = st.text_area('Text Field', file, height=200) scan = st.button('Scan Text') # if button is pressed if scan: # display statistical results st.write('Text Statistics) st.write(readability_checker(box))
Make sure you are using the latest version of pdfminer
installed using PIP as ‘pip install pdfminer.six
’.
Alright, we have passed that setback but have our PDF displayed inside the textbox, which is not bad after all.
The only downside comes from the pdfminer
library. It takes time to process bulky files. You may want to try other libraries in your project.
When users choose other options in our main()
function, the respective functions get executed in the same way using the libraries imported and send to the document_result()
function, which, in turn, passes the file to the readability_checker()
to scan. Finally, it displays the result.
You may want to check the documentation to know more about the imported libraries that help to extract the files.
The ‘Online’ Option
This option allows our users to check the readability of content found on web pages.
def get_url(): url = st.sidebar.text_input("Paste your url") if url: get_data(url)
As usual, when we select the option, it triggers the execution of the get_url()
function.
The get_url()
function uses st.sidebar.text_input
to provide a small-size box where you can paste your URL. Once you hit the Enter key, it sends the URL to the get_data()
function.
def get_data(url): page = requests.get(url) if page.status_code != 200: print('Error fetching page') exit() else: content = page.content soup = bs(content, 'html.parser') document_result(soup.get_text())
What the get_data()
function is doing is web scraping.
It requests to get the content of the URL.
👉 Recommended Tutorial: How to Get the URL Content in Python
If it is successful, it returns the content of the web page. The function then calls the BeautifulSoup library to parse the content in pure HTML form.
Using the get_text()
method from BeautifulSoup, the get_data()
extracts the content without any HTML tags and sends it to the document_result()
function which I have explained before.
The downside of using this option is that it scrapes whatever it sees on the webpage, navigation bar, header, footer, and comments that may not be relevant for readability checking
Grammar Checker
If you have been following along, you will notice, from the above image, another button besides the readability checker button.
That is our grammar checker button. Alright, let me show you how I did it.
I erased it from the Python scripts above, so we can focus on one thing at a time. The below script is now our updated test_result()
function.
def text_result(): text = 'Your text goes here...' box = st.text_area('Text Field', text, height=200) left, right = st.columns([5, 1]) scan = left.button('Check Readability') grammar = right.button('Check Gramamar') # if button is pressed if scan: # display statistical results st.write('Text Statistics') st.write(readability_checker(box)) elif grammar: st.write(grammar_checker(box))
Streamlit’s columns()
method enables us to display our buttons side by side.
By passing it a list of [5, 1]
, we specify the position we want the buttons to appear. Also, notice how we used left.button()
instead of st.button()
. This is because we want to apply the buttons to the position we have specified using the st.columns
.
The if
statement makes the app look flexible and neat. If we press the grammar checker button, it erases the readability result if it is already there, so it can display the grammar result.
Let us also update the document_result()
function.
def document_result(file): box = st.text_area('Text Field', file, height=200) left, right = st.columns([3, .75]) with left: scan = st.button('Check Readability') with right: grammar = st.button('Check Gramamar') # if button is pressed if scan: # display statistical results st.write('Text Statistics') st.write(readability_checker(box)) elif grammar: st.write(grammar_checker(box))
Again, notice another way we use the st.columns
to achieve the same result. The ‘with
’ notation inserts any element in a specified position. Then comes the grammar_checker()
function.
def grammar_checker(text): tool = language_tool_python.LanguageTool('en-US', config={'maxSpellingSuggestions': 1}) check = tool.check(text) result = [] for i in check: result.append(i) result.append(f'Error in text => {text[i.offset : i.offset + i.errorLength]}') result.append(f'Can be replaced with => {i.replacements}') result.append('--------------------------------------') return result
The LanguageTool()
function checks grammatical expressions. It comes bundled in language_tool_python
module but it’s also used in other programming languages.
To use it, make sure you have Java installed on your system. Once we call and save it in the tool variable, it will download everything necessary to enable your text checked for American English only. The size is 225MB excluding Java.
This is to enable you to use it offline. To use it online, please check the documentation. We added maxSpellingSuggestions
to speed up the checking process, especially when dealing with millions of characters.
We appended to the ‘result
’ variable to display it when called by the st.write()
function. To know more about how to use the language_tool_python
module, please consult the documentation.
Deployment
It would be nice to have our new app visible for others with little or no programming knowledge to see and use. Deploying the app makes that possible
If you want to deploy on Streamlit Cloud, it’s very easy. Set up a GitHub account if you have not already done so. Create and upload files to your GitHub repository.
Then, you set up a Streamlit Cloud account. Create a New App and link your GitHub account. Streamlit will do the rest.
Any changes made will reflect in the app. To avoid encountering errors while deploying your app, go to my GitHub page and observe other files I included to enable easy deployment on Streamlit Cloud.
Conclusion
This is how we come to the end of this tutorial on how I built a readability and grammar checker app using Streamlit.
I explained it in a way you can understand. You can visit my GitHub page to view the full project. Also, click this link to view my app live on Streamlit Cloud. Alright, that’s it. Go on, give it a try and create awesome apps.