Converting Audio files between Pydub and Librosa

Question

I'm trying to open a AudioFile in Librosa. Trim it, and then modify it using pydub. This is my code:

sound = AudioSegment.from_file(filePath)
samples = sound.get_array_of_samples()
arr = np.array(samples)
print(type(arr))
print(arr)
# then modify samples...
y, index = librosa.effects.trim(arr)

The problem is that even if I follow the solution outlined here:
https://github.com/jiaaro/pydub/issues/289

I can't seem to use librosa's trim() function. It's giving me this error:

librosa.util.exceptions.ParameterError: Audio data must be floating-point

The reason for this, is that Librosa expects a floating point numpy array (and works with them), while pydub exports an integer array (which I convert to an np array). I don't know how to convert the numpy array between the 2.

I can export to file from Pydub and then load it with Librosa - but that seems like a very inefficient way of doing things.

Package Versions:
Librosa - 0.7.1.
Pydub - 0.23.1

Anil_M · Accepted Answer

Librosa is complianing that arr data is of type int, you need to convert it to float as below,

arr = np.array(samples).astype(np.float32)

Code:

import librosa
import numpy as np
import os

from pydub import AudioSegment
from pydub.playback import play

sound = AudioSegment.from_file("test.wav")
samples = sound.get_array_of_samples()
new_sound = sound._spawn(samples)
arr = np.array(samples).astype(np.float32)
print(type(arr))
#print(arr)
# then modify samples...
y, index = librosa.effects.trim(arr)
print(index)
print(y)

Trimmed output

 <type 'numpy.ndarray'>
[  0 882]
[ 0.00000000e+00  0.00000000e+00  1.07629056e+08  1.07629056e+08
:
:
1.09489754e+09  1.09489754e+09]

Paul Totzke · Answer

def audiosegment_to_librosawav(audiosegment):    
    channel_sounds = audiosegment.split_to_mono()
    samples = [s.get_array_of_samples() for s in channel_sounds]

    fp_arr = np.array(samples).T.astype(np.float32)
    fp_arr /= np.iinfo(samples[0].typecode).max
    fp_arr = fp_arr.reshape(-1)

    return fp_arr

I use this code for resemblyzer which uses librosa. 90% of the code is here: https://github.com/jiaaro/pydub/blob/master/API.markdown#audiosegmentget_array_of_samples

Anil_M's code didn't convert the numbers to floats for me.

Zabir Al Nazi · Answer

Librosa loads audio files with float32, while pydub loads in int16 format.

So, the conversion is simply:

from pydub import AudioSegment
import librosa

a = AudioSegment.from_wav("test.wav")
b, sr = librosa.load("test.wav")
# librosa to pydub
b_p = np.array(b* (1<<15), dtype=np.int16)
a_p = np.array(a.get_array_of_samples(), dtype=np.int16)
print(b_p)
print(a_p)

array([  7,   9,   8, ..., -12, -46,   0], dtype=int16)
array([  7,   9,   8, ..., -12, -46,   0], dtype=int16)

def convert(filename):
    y, sr = librosa.load(filename)
    # convert from float to uint16
    y = np.array(y * (1<<15), dtype=np.int16)
    audio_segment = pydub.AudioSegment(
        y.tobytes(), 
        frame_rate=sr,
        sample_width=y.dtype.itemsize, 
        channels=1
    )
    return audio_segment

Converting Audio files between Pydub and Librosa

Tags:

python

numpy

audio

librosa

pydub

Igor Q.

3 Answers

Anil_M

Paul Totzke

Zabir Al Nazi

Recent Activity

Donate For Us

Converting Audio files between Pydub and Librosa

Tags:

python

numpy

audio

librosa

pydub

Igor Q.

3 Answers

Anil_M

Paul Totzke

Zabir Al Nazi

Related questions

Recent Activity

Donate For Us