Can I convert spectrograms generated with librosa back to audio?

Tags:

I converted some audio files to spectrograms and saved them to files using the following code:

import os
from matplotlib import pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd

audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_clips = os.listdir(audio_fpath)

def generate_spectrogram(x, sr, save_name):
    X = librosa.stft(x)
    Xdb = librosa.amplitude_to_db(abs(X))
    fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)
    ax = fig.add_axes([0, 0, 1, 1], frameon=False)
    ax.axis('off')
    librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
    plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)
    librosa.cache.clear()

for i in audio_clips:
    audio_fpath = "./audios/"
    spectrograms_path = "./spectrograms/"
    audio_length = librosa.get_duration(filename=audio_fpath + i)
    j=60
    while j < audio_length:
        x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
        save_name = spectrograms_path + i + str(j) + ".jpg"
        generate_spectrogram(x, sr, save_name)
        j += 60
        if j >= audio_length:
            j = audio_length
            x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
            save_name = spectrograms_path + i + str(j) + ".jpg"
            generate_spectrogram(x, sr, save_name)

I wanted to keep the most detail and quality from the audios, so that i could turn them back to audio without too much loss (They are 80MB each).

Is it possible to turn them back to audio files? How can I do it?

Example spectrograms

I tried using librosa.feature.inverse.mel_to_audio, but it didn't work, and I don't think it applies.

I now have 1300 spectrogram files and want to train a Generative Adversarial Network with them, so that I can generate new audios, but I don't want to do it if i wont be able to listen to the results later.

718

asked Apr 10 '20 01:04

Ramon Griffo

1 Answers

Yes, it is possible to recover most of the signal and estimate the phase with e.g. Griffin-Lim Algorithm (GLA). Its "fast" implementation for Python can be found in librosa. Here's how you can use it:

import numpy as np
import librosa

y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
S = np.abs(librosa.stft(y))
y_inv = librosa.griffinlim(S)

And that's how the original and reconstruction look like:

reconstruction

The algorithm by default randomly initialises the phases and then iterates forward and inverse STFT operations to estimate the phases.

Looking at your code, to reconstruct the signal, you'd just need to do:

import numpy as np

X_inv = librosa.griffinlim(np.abs(X))

It's just an example of course. As pointed out by @PaulR, in your case you'd need to load the data from jpeg (which is lossy!) and then apply inverse transform to amplitude_to_db first.

The algorithm, especially the phase estimation, can be further improved thanks to advances in artificial neural networks. Here is one paper that discusses some enhancements.

162

answered Oct 17 '22 07:10

Lukasz Tracewski

Related questions
                            
                                python KeyError: 'sapi5'
                            
                                How to parse the heatmap output for the pose estimation tflite model?
                            
                                Better way to add the result of apply (multiple outputs) to an existing DataFrame with column names
                            
                                Creating Pydantic Model Schema with Dynamic Key
                            
                                Is data safety guaranteed while using `ThreadPoolExecutor` from python's `future` module?
                            
                                sklearn ImportError: cannot import name plot_roc_curve
                            
                                Multi-layer graph in networkx
                            
                                python selenium headless chromedriver not loading full page when it was working the day before with no changes to the code
                            
                                How to extract multiple numbers from Pandas Dataframe
                            
                                Is there a way to list of parameters of FMU (or of submodel in FMU) using the python libraries FMPy or pyFMI?
                            
                                Shrinking AWS Lambda deployment package with CFLAGS and PIP to fit sklearn
                            
                                TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
                            
                                Pytorch ImageNet dataset
                            
                                Pyspark: how to extract hour from timestamp
                            
                                How to avoid conda activate base from automatically executing in my VS Code editor?
                            
                                unauthorized_client: Grant type 'authorization_code' not allowed for the client. Django -auth0 -login
                            
                                How to replace loss function during training tensorflow.keras
                            
                                Django: how to get Foreign key id?
                            
                                find least common denominator for list of fractions in python
                            
                                Reindex MultiIndex with unique values in level

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I convert spectrograms generated with librosa back to audio?

Tags:

python

signal-processing

audio

librosa

spectrogram

Ramon Griffo

People also ask

1 Answers

Lukasz Tracewski

Recent Activity

Donate For Us