I converted some audio files to spectrograms and saved them to files using the following code:
import os
from matplotlib import pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd
audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_clips = os.listdir(audio_fpath)
def generate_spectrogram(x, sr, save_name):
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)
ax = fig.add_axes([0, 0, 1, 1], frameon=False)
ax.axis('off')
librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)
librosa.cache.clear()
for i in audio_clips:
audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_length = librosa.get_duration(filename=audio_fpath + i)
j=60
while j < audio_length:
x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
save_name = spectrograms_path + i + str(j) + ".jpg"
generate_spectrogram(x, sr, save_name)
j += 60
if j >= audio_length:
j = audio_length
x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
save_name = spectrograms_path + i + str(j) + ".jpg"
generate_spectrogram(x, sr, save_name)
I wanted to keep the most detail and quality from the audios, so that i could turn them back to audio without too much loss (They are 80MB each).
Is it possible to turn them back to audio files? How can I do it?
I tried using librosa.feature.inverse.mel_to_audio, but it didn't work, and I don't think it applies.
I now have 1300 spectrogram files and want to train a Generative Adversarial Network with them, so that I can generate new audios, but I don't want to do it if i wont be able to listen to the results later.
load. Load an audio file as a floating point time series. Audio will be automatically resampled to the given rate (default sr=22050 ).
Save spectrogram to file To save the created spectrogram, first convert it to an image. It will no longer be an OpenSoundscape Spectrogram object, but instead a Python Image Library (PIL) Image object. Save the PIL Image using its save() method, supplying the filename at which you want to save the image.
Calculate the mel spectrums of 2048-point periodic Hann windows with 1024-point overlap. Convert to the frequency domain using a 4096-point FFT. Pass the frequency-domain representation through 64 half-overlapped triangular bandpass filters that span the range 62.5 Hz to 8 kHz.
Yes, it is possible to recover most of the signal and estimate the phase with e.g. Griffin-Lim Algorithm (GLA). Its "fast" implementation for Python can be found in librosa. Here's how you can use it:
import numpy as np
import librosa
y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
S = np.abs(librosa.stft(y))
y_inv = librosa.griffinlim(S)
And that's how the original and reconstruction look like:
The algorithm by default randomly initialises the phases and then iterates forward and inverse STFT operations to estimate the phases.
Looking at your code, to reconstruct the signal, you'd just need to do:
import numpy as np
X_inv = librosa.griffinlim(np.abs(X))
It's just an example of course. As pointed out by @PaulR, in your case you'd need to load the data from jpeg
(which is lossy!) and then apply inverse transform to amplitude_to_db
first.
The algorithm, especially the phase estimation, can be further improved thanks to advances in artificial neural networks. Here is one paper that discusses some enhancements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With