I found a question on a Random Word Generator game on the question and answer site StackOverflow. The question contained a small runnable version of the game code.
The author’s question was: where to find large English word lists on the internet?
Getting this large list of words would add good game replay value to the game and potentially make it a lot more compelling for the end-users.
I thought that the questions its small and readable code contained many interesting that I could expand upon. I could use it to learn how python language randomness for users to interact with. I could also use it to extend other features of the game around the word list to make it more robust.
The StackOverflow question was entitled Random word generator- Python.
Motivational Example Word Game
Here’s the runnable code of the game:
import random WORDS = ("python", "jumble", "easy", "difficult", "answer", "xylophone") word = random.choice(WORDS) correct = word jumble = "" while word: position = random.randrange(len(word)) jumble += word[position] word = word[:position] + word[(position + 1):] print( """ Welcome to WORD JUMBLE!!! Unscramble the leters to make a word. (press the enter key at prompt to quit) """ ) print("The jumble is:", jumble) guess = input("Your guess: ") while guess != correct and guess != "": print("Sorry, that's not it") guess = input("Your guess: ") if guess == correct: print("That's it, you guessed it!\n") print("Thanks for playing") input("\n\nPress the enter key to exit")
You can play it interactively here:
The game randomly chooses a word from a list. Then it jumbles or scrambles the word by changing the order of letters in it. The code does this by randomly choosing a number that is 0 to the length of the word -1. This is then used as an index. The word is then The game user is supposed to figure out what the correct word is when the letters are unscrambled.
After that, the user unscrambles the letters to make a word. The user inputs this guess by using the keyboard and pressing enter. If the user unscrambles the word incorrectly then they are required to keep guessing the correct word. Once the user guesses the correct answer which is python then the program prints "thank you for playing"
. The game ends when the user presses Enter
to exit the game.
In the second line of code, the author of the question just pulls in a couple of words that are hardcoded into the answer. I found some word lists, optimized the randomization and retrieval of word lists. I also cleaned the word lists for any inconsistencies in type or formatting.
How to Get Word Lists
The StackOverflow question about where to find word lists had multiple answers. The response that was marked as the answer by the author, contained a 1000 word list from the word list called word lists – MIT . The author of the answer showed how to read the word list by making a web request or reading it from the hard drive.
The author did not integrate this with the code that the StackOverflow question. Since this was not done, I decided to implement a web request function that pulled in the word list resources and read them and a file IO function.
Some of the word lists were in from plain text files and others were from files that contained a byte type.
There was a collection of word lists here:
- I used the 100 word list from word lists – MIT.
- Natural Language Corpus Data: Beautiful Data – This word list has data fromthe most frequently used word list from 2008 to 2009. These word lists also show how many times the words were used.
- This is a good list for kid grade levels second grade spelling word lists up to eighth grade. This could be useful if the game is designed for kids. I decided to make this code the default so I could more easily guess and test what the words were.
I saw a couple of other word lists that I chose not to use because theyβd require scraping from the web, were proprietary, or did not seem as comprehensive. There did seem to be other good word lists on Kaggle.
Adding Value to the Game
One of the most fun parts of going through this coding exercise was adding additional features to the game. I added code retrieving the word lists. I also added a feature that enforced a set degree of randomness that I determined was necessary to have the unjumbling of the word to be challenging.
I also added value to the game by
- Adding Game settings
- The settings
MINIMUM_WORD_LENGTH = 5
andMAXIMUM_WORD_LENGTH = 7
to control the size of the words that the user can guess - The words from file were a flag to decide whether or not the word list was from the file or from a web request.
- The user could also choose
- The settings
#GAME SETTINGS MINIMUM_WORD_LENGTH = 5 MAXIMUM_WORD_LENGTH = 7 WORDS_FROM_FILE = False WORD_LIST_TO_USE = "THIRD_GRADE_WORDS"
- Creating functions so the code was more testable. This can be seen throughout the code
- Cleaned up the words in the word list so they could be read in if they were bytes or strings
- This MIT word list was in a file format that when read was in bytes. Other word lists were in strings. The code was changed so it could convert the word that was in bytes into a string so it could be jumbled. I modified the code so there were separate functions that could easily be tested by me for the proper conversion of strings to bytes.
- Some code had additional characters like numbers or extra characters. I used a regular expression to remove these extra characters.
def format_words(words): if len(words) > 1: words_pattern = '[a-z]+' if type(words[0]) is bytes: words = [re.findall(words_pattern, word.decode('utf-8'), flags=re.IGNORECASE)[0] for word in words] else: words = [re.findall(words_pattern, word, flags=re.IGNORECASE)[0] for word in words] words = [word for word in words if len(word) >= MINIMUM_WORD_LENGTH and len(word) <= MAXIMUM_WORD_LENGTH] return words
- Making it easy to swap between word lists by adding a dictionary
if WORDS_FROM_FILE: words = get_words_from_file(WORD_LIST_FILE[WORD_LIST_TO_USE]) else: words = get_word_list_from_web(WORD_LIST_WEB[WORD_LIST_TO_USE]) words = format_words(words)
- Made sure that word was jumbled to a degree that made guessing fun
- I added a sequence matcher code that enforced a certain percentage of randomness in the word. It did so by looping through the code
- There was code added to make sure that the word was jumbled to a certain degree. If it was not then the word was jumbled again. Hereβs how a SequnceMatcher works SequenceMatcher in Python. A human-friendly longest contiguous &β¦ | by Nikhil Jaiswal | Towards Data Science
def generate_unique_shuffled_word(word): while True: shuffled_word = shuffle_word(word) simliar_percent = SequenceMatcher(None, shuffled_word, word).ratio() if MINIMUM_WORD_LENGTH >= 5 and simliar_percent <= 0.5: break return shuffled_word
Full Code
import random import requests import re from difflib import SequenceMatcher from pathlib import Path #GAME SETTINGS MINIMUM_WORD_LENGTH = 5 MAXIMUM_WORD_LENGTH = 7 WORDS_FROM_FILE = False WORD_LIST_TO_USE = "THIRD_GRADE_WORDS" WORD_LIST_WEB = { "MIT_WORDS": "https://www.mit.edu/~ecprice/wordlist.10000", "NORVIG_WORDS": "http://norvig.com/ngrams/count_1w.txt", "THIRD_GRADE_WORDS": "http://www.ideal-group.org/dictionary/p-3_ok.txt" } WORD_LIST_FILE = { "MIT_WORDS": "mit_wordlist.10000", "NORVIG_WORDS": "norvig_count_1w.txt", "THIRD_GRADE_WORDS": "p-3_ok.txt" } def get_word_list_from_web(word_site): response = requests.get(word_site) words = response.content.splitlines() return words def format_words(words): if len(words) > 1: words_pattern = '[a-z]+' if type(words[0]) is bytes: words = [re.findall(words_pattern, word.decode('utf-8'), flags=re.IGNORECASE)[0] for word in words] else: words = [re.findall(words_pattern, word, flags=re.IGNORECASE)[0] for word in words] words = [word for word in words if len(word) >= MINIMUM_WORD_LENGTH and len(word) <= MAXIMUM_WORD_LENGTH] return words def get_words_from_file(word_path): file_directory = Path().absolute() word_file_path = str(file_directory) + "\\" + WORD_LIST_FILE[WORD_LIST_TO_USE] words = open(word_file_path).readlines() return words def shuffle_word(word): jumble = "" while word: position = random.randrange(len(word)) jumble += word[position] word = word[:position] + word[(position + 1):] return jumble def generate_unique_shuffled_word(word): while True: shuffled_word = shuffle_word(word) simliar_percent = SequenceMatcher(None, shuffled_word, word).ratio() if MINIMUM_WORD_LENGTH >= 5 and simliar_percent <= 0.5: break return shuffled_word def main(): print( """ Welcome to WORD JUMBLE!!! Unscramble the leters to make a word. (press the enter key at prompt to quit) """ ) if WORDS_FROM_FILE: words = get_words_from_file(WORD_LIST_FILE[WORD_LIST_TO_USE]) else: words = get_word_list_from_web(WORD_LIST_WEB[WORD_LIST_TO_USE]) words = format_words(words) word = random.choice(words).lower() shuffle_word = generate_unique_shuffled_word(word) correct_word = word print(shuffle_word) guess = input("Your guess: ") while (guess != correct_word and guess != "" ) : print("Sorry, that's not it") guess = input("Your guess: ") if guess == correct_word: print("That's it, you guessed it!\n") print("Thanks for playing") input("\n\nPress the enter key to exit") main()
The version of the code is here on GitHub.
Conclusion
I learned about Python, different word lists, and implementing randomness for a user. Most importantly, I had fun coding it!
I hope you had fun reading and learning about it as well, and you picked something up from this simple piece of code as well.