How to Install OpenAI Whisper (Win, Mac, Linux, Ubuntu)

5/5 - (1 vote)

Run pip3 install openai-whisper in your command line. Once installed, use Whisper to transcribe audio files.

pip install openai-whisper

Alternatively, you may use any of the following commands to install openai, depending on your concrete environment (Linux, Ubuntu, Windows, macOS). One is likely to work!

πŸ’‘ If you have only one version of Python installed:
pip install openai-whisper

πŸ’‘ If you have Python 3 (and, possibly, other versions) installed:
pip3 install openai-whisper

πŸ’‘ If you don't have PIP or it doesn't work
python -m pip install openai-whisper
python3 -m pip install openai-whisper

πŸ’‘ If you have Linux and you need to fix permissions (any one):
sudo pip3 install openai-whisper
pip3 install openai-whisper --user

πŸ’‘ If you have Linux with apt
sudo apt install openai-whisper

πŸ’‘ If you have Windows and you have set up the py alias
py -m pip install openai-whisper

πŸ’‘ If you have Anaconda
conda install -c anaconda openai-whisper

πŸ’‘ If you have Jupyter Notebook
!pip install openai-whisper
!pip3 install openai-whisper

With Upgrade Installation Routine

Upgrade pip and install the openai library using the following two commands, one after the other:

  • python3 -m pip install --upgrade pip
  • python3 -m pip install --upgrade openai-whisper

Here’s the code for copy&pasting:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade openai-whisper

Detailed Instructions

The codebase is compatible with Python versions 3.8 to 3.11 and recent PyTorch releases. Key dependencies include OpenAI’s ‘tiktoken‘ for fast tokenization. To install or update to the latest release of Whisper, use:

pip install -U openai-whisper

For the latest repository version and dependencies, use:

pip install git+https://github.com/openai/whisper.git

To update to the repository’s latest version without dependencies:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

FFmpeg, a command-line tool, is also required and can be installed via various package managers:

  • For Ubuntu or Debian: sudo apt update && sudo apt install ffmpeg
  • For Arch Linux: sudo pacman -S ffmpeg
  • For MacOS with Homebrew: brew install ffmpeg
  • For Windows with Chocolatey: choco install ffmpeg
  • For Windows with Scoop: scoop install ffmpeg

If ‘tiktoken‘ lacks a pre-built wheel for your platform, installing Rust may be necessary. In case of installation errors, follow the Rust development environment setup and adjust the PATH environment variable as needed. If encountering 'No module named setuptools_rust', install it via pip install setuptools-rust.

Whisper Models

Whisper offers five model sizes, from ‘tiny’ to ‘large’, with English-only versions available for four sizes. These models vary in memory requirements, speed, and accuracy. English-only models (‘.en’) generally perform better, especially the ‘tiny.en’ and ‘base.en’ versions.

Performance varies by language with WER (word error rate) and CER (character error rate) metrics:

How to Transcribe Audio with Whisper?

For command-line usage, Whisper can transcribe audio files using different models:

whisper audio.flac audio.mp3 audio.wav --model medium

The default setting is suitable for English. Non-English speech transcription and translation into English are also supported:

whisper japanese.wav --language Japanese --task translate

Use whisper --help to view all options. Available languages are listed in tokenizer.py:

Python Usage (Transcription) with Whisper

In Python, transcription can be performed with:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

This process involves a 30-second sliding window for sequence-to-sequence predictions. The whisper.detect_language() and whisper.decode() functions offer lower-level access:

import whisper

model = whisper.load_model("base")
audio = whisper.pad_or_trim(whisper.load_audio("audio.mp3"))
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
print(result.text)

If you want to master Whisper, check out our full prompt engineering mastery course teaching you the ins and outs of speech recognition in Python on the Finxter Academy: πŸ‘‡

Full Course: OpenAI Whisper – Building Cutting-Edge Python Apps with OpenAI Whisper

Check out our full OpenAI Whisper course with video lessons, easy explanations, GitHub, and a downloadable PDF certificate to prove your speech processing skills to your employer and freelancing clients:

πŸ‘‰ [Academy] Voice-First Development: Building Cutting-Edge Python Apps Powered By OpenAI Whisper