So I am trying to get librosa to work with a microphone input instead of just a wav file and have been running to a few problems. Initially I use the pyaudio library to connect to the microphone but I am having trouble translating this data for librosa to use. Any suggestions on how this should be approached, or is it even possible?
A few things I tried include receiving data from pyaudio mic, decode it into an array of floats and pass it to librosa (as from the docs, this is what librosa does with wav files with .load), but it doesn't work as it produces the following error: "librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere"
FORMAT = pyaudio.paInt16
RATE = 44100
CHUNK = 2048
WIDTH = 2
CHANNELS = 2
RECORD_SECONDS = 5
stream = audio.open(format=FORMAT,
channels = CHANNELS,
rate = RATE,
input=True,
output=True,
frames_per_buffer=CHUNK)
while True:
data = stream.read(CHUNK)
data_float = np.fromstring(data , dtype=np.float16)
data_np = np.array(data_float , dtype='d')
# data in 1D array
mfcc = librosa.feature.mfcc(data_np.flatten() , 44100)
print(mfcc)
librosa uses soundfile and audioread for reading audio. As of v0. 7, librosa uses soundfile by default, and falls back on audioread only when dealing with codecs unsupported by soundfile (notably, MP3, and some variants of WAV).
To add to the above answer, you may also use librosa function librosa.get_duration(y,sr) to get the duration of the audio file in seconds. Or you may use len(y)/sr to get the audio file duration in seconds.
You can do it using callback
function from pyaudio
. I think it's easier using a class.
In the constructor __init__
you define all the constant you need and you set the FORMAT to pyaudio.paFloat32
that will enable you later to use it with librosa
.
Then in the start
method I open the audio stream. The stream_callback
parameters in the .open()
let you specify the way you want to implement your function.
callback
method take as argument in_data, frame_count, time_info, flag
then you receive the in_data
in binaries. So you need to use np.frombuffer(in_data, dtype=np.float32)
to convert them into a numpy array.
Once this is done you can use your numpy.ndarray
as you normally would with librosa
I think this can be optimized, but this solution works fine for me, hoping it helps :)
import numpy as np
import pyaudio
import time
import librosa
class AudioHandler(object):
def __init__(self):
self.FORMAT = pyaudio.paFloat32
self.CHANNELS = 1
self.RATE = 44100
self.CHUNK = 1024 * 2
self.p = None
self.stream = None
def start(self):
self.p = pyaudio.PyAudio()
self.stream = self.p.open(format=self.FORMAT,
channels=self.CHANNELS,
rate=self.RATE,
input=True,
output=False,
stream_callback=self.callback,
frames_per_buffer=self.CHUNK)
def stop(self):
self.stream.close()
self.p.terminate()
def callback(self, in_data, frame_count, time_info, flag):
numpy_array = np.frombuffer(in_data, dtype=np.float32)
librosa.feature.mfcc(numpy_array)
return None, pyaudio.paContinue
def mainloop(self):
while (self.stream.is_active()): # if using button you can set self.stream to 0 (self.stream = 0), otherwise you can use a stop condition
time.sleep(2.0)
audio = AudioHandler()
audio.start() # open the the stream
audio.mainloop() # main operations with librosa
audio.stop()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With