5 Best Ways to Convert Text to Speech in Python - Be on the Right Side of Change

💡 Problem Formulation: How can we make our Python programs speak out text? This question concerns the process of converting strings of text into audible speech, which can be helpful for accessibility and user interface improvement. For instance, you would input the string “Hello World” and expect to hear an audio output of a voice saying “Hello World”.

Method 1: Using pyttsx3 Library

Pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3. The engine is quite flexible, allowing for varying speech rates, volumes, and voices.

Here’s an example:

import pyttsx3
engine = pyttsx3.init()
engine.say("Hello, Python enthusiasts!")
engine.runAndWait()

The output will be an audible “Hello, Python enthusiasts!” spoken by the system’s default text-to-speech voice.

This code snippet initializes the text-to-speech engine, sets up the text to be spoken, and processes the speech. The runAndWait() call is blocking and will wait until the speech is finished before moving on in the script.

Method 2: Utilizing gTTS (Google Text-to-Speech)

gTTS is a Python library and CLI tool that interfaces with Google Translate’s text-to-speech API. You can output the text to an MP3 file, which ensures a high-quality result, albeit an internet connection is required.

Here’s an example:

from gtts import gTTS
tts = gTTS('Hello world, this is Google Text-to-Speech in action!')
tts.save('hello.mp3')

The output will be an MP3 file, “hello.mp3”, containing the spoken phrase “Hello world, this is Google Text-to-Speech in action!”.

After importing the necessary module, you create a gTTS object with the text to speak, and then save it to an MP3 file. Playing this file will produce the audible text.

Method 3: Using the speech_recognition Library

The speech_recognition library allows you to easily perform speech recognition and conversion. It can also be used for text-to-speech by integrating with other services like Google Speech Recognition.

Here’s an example:

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    text = r.recognize_google(audio)
    print('You said: {}'.format(text))

This will output transcribed text from what was spoken into the microphone.

This snippet makes use of the microphone to capture speech, then utilizes Google’s speech recognition service to transcribe it. Note this method is more about speech recognition than text-to-speech conversion.

Method 4: Using the macOS ‘say’ Command through os.system()

For macOS users, the built-in ‘say’ command can be executed from a Python script to perform text-to-speech. The versatility of this command makes it an excellent choice for quick tests on macOS systems.

Here’s an example:

import os
os.system("say 'Hello, Apple fanatics!'")

The output will be a clear “Hello, Apple fanatics!” spoken by the machine’s system voice.

This utilizes the os.system() call to send the ‘say’ command directly to the macOS command line. It’s a quick and simple method that doesn’t require the installation of extra libraries, but it’s applicable only to macOS.

Bonus One-Liner Method 5: Using Python’s subprocess Module

If you’re looking for a single line of code that can invoke your system’s native text-to-speech capabilities, the subprocess module is your friend. This module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

Here’s an example:

import subprocess
subprocess.call('say "Hello World in one line."', shell=True)

The output will be “Hello World in one line.”, delightfully spoken by your system’s narrator.

This one-liner sends a ‘say’ command to your system’s shell through the subprocess module. Although it’s concise, it is system-dependent and requires familiarity with your system’s text-to-speech commands.

Summary/Discussion

Method 1: pyttsx3. The library works offline and is highly customizable. However, the voice quality can vary depending on the system.
Method 2: gTTS. Integrates with Google for high-quality sound, but requires an internet connection and generates an MP3 instead of real-time speech.
Method 3: speech_recognition. Useful for speech recognition, but not directly designed for text-to-speech. Requires internet for Google API.
Method 4: macOS ‘say’. Simple and effective on macOS with no dependency on Python libraries, but not cross-platform.
Bonus Method 5: subprocess. Cross-platform one-liner but requires knowledge of system-specific TTS commands.