How can I do real-time voice activity detection in Python?

Tags:

I am performing a voice activity detection on the recorded audio file to detect speech vs non-speech portions in the waveform.

The output of the classifier looks like (highlighted green regions indicate speech):

enter image description here

The only issue I face here is making it work for a stream of audio input (for eg: from a microphone) and do real-time analysis for a stipulated time-frame.

I know PyAudio can be used to record speech from the microphone dynamically and there a couple of real-time visualization examples of a waveform, spectrum, spectrogram, etc, but could not find anything relevant to carrying out feature extraction in a near real-time manner.

745

asked Mar 24 '20 13:03

Nickil Maveli

1 Answers

You should try using Python bindings to webRTC VAD from Google. It's lightweight, fast and provides very reasonable results, based on GMM modelling. As the decision is provided per frame, the latency is minimal.

Click to copy

# Run the VAD on 10 ms of silence. The result should be False.
import webrtcvad
vad = webrtcvad.Vad(2)

sample_rate = 16000
frame_duration = 10  # ms
frame = b'\x00\x00' * int(sample_rate * frame_duration / 1000)
print('Contains speech: %s' % (vad.is_speech(frame, sample_rate))

Also, this article might be useful for you.

138

answered Sep 18 '22 06:09

igrinis

Related questions
                            
                                Pandas Dataframe: How to update multiple columns by applying a function?
                            
                                How to find the shortest dependency path between two words in Python?
                            
                                'Graph' object has no attribute 'nodes_iter' in networkx module python
                            
                                How to make a ttk.Combobox callback
                            
                                Django: How to get related objects of a queryset?
                            
                                Get all comments from a specific reddit thread in python
                            
                                SqlAlchemy: How to implement DROP TABLE ... CASCADE?
                            
                                Error when using importlib.util to check for library
                            
                                Django loaddata UNIQUE constraint failed
                            
                                Python: nested 'for' loops
                            
                                Create adjacency matrix for two columns in pandas dataframe
                            
                                Max in a sliding window in NumPy array
                            
                                pandas read_excel multiple tables on the same sheet
                            
                                sklearn: Hyperparameter tuning by gradient descent?
                            
                                How to extract and save images from tensorboard event summary?
                            
                                Clean-up database connection with SQLAlchemy in Pandas
                            
                                How to force matplotlib to show values on x-axis as integers
                            
                                Pandas: How to workaround "error tokenizing data"?
                            
                                Is it possible to ignore pyright checking for one line?
                            
                                Update anaconda failed - Entry point not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I do real-time voice activity detection in Python?

Tags:

python

speech-recognition

speech

speech-to-text

pyaudio

Nickil Maveli

People also ask

1 Answers

igrinis

Recent Activity

Donate For Us