Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most simple and fast method for audio activity detection?

Given is an array of 320 elements (int16), which represent an audio signal (16-bit LPCM) of 20 ms duration. I am looking for a most simple and very fast method which should decide whether this array contains active audio (like speech or music), but not noise or silence. I don't need a very high quality of the decision, but it must be very fast.

It occurred to me first to add all squares or absolute values of the elements and compare their sum with a threshold, but such a method is very slow on my system, even if it is O(n).

like image 736
psihodelia Avatar asked Jul 01 '10 09:07

psihodelia


People also ask

What is voice activity detection and how does it work?

Voice activity detection is a technique to detect when there is speech or voice present in the signal.

How does voice activity detection (VAD) reduce processor utilization?

The overall processor utilization can be significantly reduced if voice activity detection (VAD) is enabled. Voice activity detection is a technique to detect when there is speech or voice present in the signal.

What is the best way to classify audio data?

For instance, for an audio classification problem, you would pass this through a Classifier usually consisting of some fully connected Linear layers. For a Speech-to-Text problem, you could pass it through some RNN layers to extract text sentences from this encoded representation.

How does audio machine learning work?

Similarly, audio machine learning applications used to depend on traditional digital signal processing techniques to extract features. For instance, to understand human speech, audio signals could be analyzed using phonetics concepts to extract elements like phonemes.


1 Answers

You're not going to get much faster than a sum-of-squares approach.

One optimization that you may not be doing so far is to use a running total. That is, in each time step, instead of summing the squares of the last n samples, keep a running total and update that with the square of the most recent sample. To avoid your running total from growing and growing over time, add an exponential decay. In pseudocode:

decay_constant=0.999;  // Some suitable value smaller than 1
total=0;
for t=1,...
    // Exponential decay
    total=total*decay_constant;

    // Add in latest sample
    total+=current_sample;

    if total>threshold
        // do something
    end
end

Of course, you'll have to tune the decay constant and threshold to suit your application. If this isn't fast enough to run in real time, you have a seriously underpowered DSP...

like image 69
Martin B Avatar answered Oct 04 '22 07:10

Martin B