Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any simple VAD implementation?

I'm looking for some C/C++ code for VAD (Voice Activity Detection).

Basically, my application is reading PCM frames from the device. I would like to know when the user is talking. I'm not looking for any speech recognition algorithm but only for voice detection.

I would like to know when the user is talking and when he finishes:

bool isVAD(short* pcm,size_t count);

like image 843
Gilad Novik Avatar asked Mar 20 '11 07:03

Gilad Novik


People also ask

What is VAD algorithm?

[10] developed a VAD algorithm that is highly sensitive to background noise. The algorithm calculates permutation entropy (PE), and determines the presence or absence of speech, as well as distinguishing between voiced and unvoiced parts of speech.

What is VAD telephony?

In Voice over IP (VOiP), voice activation detection (VAD) is a software application that allows a data network carrying voice traffic over the Internet to detect the absence of audio and conserve bandwidth by preventing the transmission of "silent packets" over the network.


2 Answers

Google's open-source WebRTC code has a VAD module written in C. It uses a Gaussian Mixture Model (GMM), which is typically much more effective than a simple energy-threshold detector, especially in a situation with dynamic levels and types of background noise. In my experience it's also much more effective than the Moattar-Homayounpour VAD that Gilad mentions in their comment.

The VAD code is part of the much, much larger WebRTC repository, but it's very easy to pull it out and compile it on its own. E.g. the webrtcvad Python wrapper includes just the VAD C source.

The WebRTC VAD API is very easy to use. First, the audio must be mono 16 bit PCM, with either a 8 KHz, 16 KHz or 32 KHz sample rate. Each frame of audio that you send to the VAD must be 10, 20 or 30 milliseconds long.

Here's an outline of an example that assumes audio_frame is 10 ms (320 bytes) of audio at 16000 Hz:

#include "webrtc/common_audio/vad/include/webrtc_vad.h"
// ...
VadInst *vad;
WebRtcVad_Create(&vad);
WebRtcVad_Init(vad);
int is_voiced = WebRtcVad_Process(vad, 16000, audio_frame, 160);
like image 159
John Wiseman Avatar answered Sep 17 '22 15:09

John Wiseman


There are open source implementations in the Sphinx and Freeswitch projects. I think they are all energy based detectors do won't need any kind model.

Sphinx 4 (Java but it should be easy to port to C/C++)

PocketSphinx

Freeswitch

like image 44
Paul Dixon Avatar answered Sep 20 '22 15:09

Paul Dixon