How can I save a Librosa spectrogram plot as a specific sized image?

So I'm wanting to feed spectrogram images to a convolutional neural network as an attempt to classify various sounds. I want each image to be exactly 384x128 pixels. However, when I actually save the image it is only 297x98. Here's my code:

def save_spectrogram(num):
  dpi = 128
  x_pixels = 384
  y_pixels = 128
  samples, sr = load_wave(num)
  stft = np.absolute(librosa.stft(samples))
  db = librosa.amplitude_to_db(stft, ref=np.max)
  fig = plt.figure(figsize=(x_pixels//dpi, y_pixels//dpi), dpi=dpi, frameon=False)
  ax = fig.add_subplot(111)
  ax.axes.get_xaxis().set_visible(False)
  ax.axes.get_yaxis().set_visible(False)
  ax.set_frame_on(False)
  librosa.display.specshow(db, y_axis='linear')
  plt.savefig(TRAIN_IMG+str(num)+'.jpg', bbox_inches='tight', pad_inches=0, dpi=dpi)

Does anyone have any pointers on how I can fix this? I've also tried doing it without the subplot, but when I do that it still saves as the wrong size AND has white space/background.

How do you save a Librosa spectrogram as an image?

Convert the power spectrogram (amplitude squared) to decibel (dB) units, using power_to_db() method.. Display the spectrogram as img (we can save it here). Save the img using savefig(). Display the image using plt.

Plots are for humans to look at, and contains things like axis markers, labels etc that are not useful for machine learning. To feed a model with an 'image' of the spectrogram, one should output only the data. This data be stored in any format, but if you want to use a standard image format then should use PNG. Lossy compression such as JPEG introduces compression artifacts.

Here follows working example code to save spectrogram. Note that to get a fixed size image output, the code extracts a fixed-length window of the audio signal. Dividing an audio stream into such fixed-length analysis windows is standard practice.

Example code

import librosa
import numpy
import skimage.io

def scale_minmax(X, min=0.0, max=1.0):
    X_std = (X - X.min()) / (X.max() - X.min())
    X_scaled = X_std * (max - min) + min
    return X_scaled

def spectrogram_image(y, sr, out, hop_length, n_mels):
    # use log-melspectrogram
    mels = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels,
                                            n_fft=hop_length*2, hop_length=hop_length)
    mels = numpy.log(mels + 1e-9) # add small number to avoid log(0)

    # min-max scale to fit inside 8-bit range
    img = scale_minmax(mels, 0, 255).astype(numpy.uint8)
    img = numpy.flip(img, axis=0) # put low frequencies at the bottom in image
    img = 255-img # invert. make black==more energy

    # save as PNG
    skimage.io.imsave(out, img)


if __name__ == '__main__':
    # settings
    hop_length = 512 # number of samples per time-step in spectrogram
    n_mels = 128 # number of bins in spectrogram. Height of image
    time_steps = 384 # number of time-steps. Width of image

    # load audio. Using example from librosa
    path = librosa.util.example_audio_file()
    y, sr = librosa.load(path, offset=1.0, duration=10.0, sr=22050)
    out = 'out.png'

    # extract a fixed length window
    start_sample = 0 # starting at beginning
    length_samples = time_steps*hop_length
    window = y[start_sample:start_sample+length_samples]
    
    # convert to PNG
    spectrogram_image(window, sr=sr, out=out, hop_length=hop_length, n_mels=n_mels)
    print('wrote file', out)

Output

Spectrogram stored as PNG image

How can I save a Librosa spectrogram plot as a specific sized image?

Tags:

python

matplotlib

audio

librosa

Sam McC

People also ask

1 Answers

Example code

Output

Jon Nordby

Recent Activity

Donate For Us

How can I save a Librosa spectrogram plot as a specific sized image?

Tags:

python

matplotlib

audio

librosa

Sam McC

People also ask

1 Answers

Example code

Output

Jon Nordby

Related questions

Recent Activity

Donate For Us