5 Best Ways to Convert a WAV file to a Spectrogram in Python3

💡 Problem Formulation: Converting a WAV file into a spectrogram is a common task in audio processing that involves generating a visual representation of the spectrum of frequencies in the audio file as they vary with time. Input is a WAV file, e.g., ‘sample.wav’, and the desired output is a spectrogram visualization, typically as an image file.

Method 1: Using matplotlib and scipy

Matplotlib, a popular plotting library, in conjunction with scipy, a scientific computing library, can be utilized to convert a WAV file to a spectrogram. This method involves reading the audio data with scipy and plotting the spectrogram using matplotlib’s specgram method, which provides a simple interface for spectrogram generation.

Here’s an example:

import matplotlib.pyplot as plt
from scipy.io import wavfile

# Read WAV file
sample_rate, samples = wavfile.read('sample.wav')

# Generate spectrogram
plt.specgram(samples, Fs=sample_rate)
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.title('Spectrogram')
plt.show()

Output: A window displaying the spectrogram of the ‘sample.wav’ file.

The code uses wavfile.read to read the WAV file’s sample rate and data. It then calls plt.specgram to create the spectrogram, setting the sampling frequency to the file’s sample rate. Finally, axes labels and a title are set before displaying the plot using plt.show().

Method 2: Using librosa

Librosa is a library for audio and music processing in Python. Converting WAV to a spectrogram with librosa involves using the library’s feature extraction functions to compute the Short-Time Fourier Transform (STFT) and then converting the complex values to a magnitude spectrogram.

Here’s an example:

import librosa
import librosa.display
import matplotlib.pyplot as plt

# Load WAV file
y, sr = librosa.load('sample.wav')

# Compute spectrogram
S = librosa.stft(y)
D = librosa.amplitude_to_db(abs(S), ref=np.max)

# Plot spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.show()

Output: A window displaying the spectrogram with a logarithmic frequency axis.

This snippet first loads the WAV file using librosa.load. It then calculates the STFT using librosa.stft and converts it to a decibel scale with librosa.amplitude_to_db. The result is displayed using librosa.display.specshow, which handles the complex plotting aspects of the spectrogram.

Method 3: Using numpy and matplotlib

It’s possible to manually compute the spectrogram using numpy for numerical operations and matplotlib for visualization. This process includes calculating the Fourier transform for segments of the audio signal to create the spectrogram matrix, which is then plotted.

Here’s an example:

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

# Load WAV file
sample_rate, samples = wavfile.read('sample.wav')

# Define segment length and overlap
segment_length = 1024
overlap = 512
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, 
                                                     nperseg=segment_length, 
                                                     noverlap=overlap)

# Plot spectrogram
plt.pcolormesh(times, frequencies, 10 * np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.title('Spectrogram')
plt.colorbar(label='Intensity [dB]')
plt.show()

Output: A window displaying the spectrogram with color intensity representing magnitude.

The code reads the WAV file and defines the segment length and overlap for analysis. Then signal.spectrogram is used to create the spectrogram data, taking into account the overlap and segment length. The data is plotted using plt.pcolormesh with a color scale indicative of intensity.

Method 4: Using PyDub and matplotlib

PyDub is a high-level audio library that can be combined with matplotlib for spectrogram visualization. This method includes exporting the audio data into a format compatible with numpy arrays and then using familiar matplotlib plotting calls.

Here’s an example:

from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np

# Load WAV file
audio = AudioSegment.from_file('sample.wav')

# Convert to numpy array
samples = np.array(audio.get_array_of_samples())

# Generate spectrogram
plt.specgram(samples, Fs=audio.frame_rate)
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.title('Spectrogram')
plt.show()

Output: A window displaying the spectrogram of ‘sample.wav’.

The snippet loads the WAV file using PyDub, extracts the sample array with audio.get_array_of_samples(), and then generates a spectrogram using matplotlib’s plt.specgram. The method simplifies handling of different audio formats.

Bonus One-Liner Method 5: Using pyplot’s specgram method directly

If the WAV file data is already available as a NumPy array, creating a spectrogram can be a simple one-liner using matplotlib’s specgram method directly.

Here’s an example:

plt.specgram(samples, Fs=sample_rate)
plt.show()

Output: Display the spectrogram of the given samples and sample_rate in a window.

The one-liner assumes samples and sample_rate variables are predefined numpy array of audio samples and the sample rate, respectively. This method demonstrates the efficiency of matplotlib for quick visualization tasks.

Summary/Discussion

Method 1: matplotlib and scipy. Strengths: Easy to use, great for quick visualizations. Weaknesses: Limited customization options for advanced users.
Method 2: librosa. Strengths: Designed for audio analysis, offers a variety of additional features. Weaknesses: Might require additional learning for beginners.
Method 3: numpy and matplotlib. Strengths: Provides control over the spectrogram calculation. Weaknesses: More complex and requires an understanding of signal processing concepts.
Method 4: PyDub and matplotlib. Strengths: Simplifies audio data handling, supports multiple formats. Weaknesses: Requires an external library, additional installation.
Method 5: One-liner matplotlib. Strengths: Quick and efficient for preloaded data. Weaknesses: Assumes prior extraction and loading of audio data.