Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I interpret an AudioBuffer and get the power?

I am trying to make a volume-meter for my app, which will show while recording a video. I have found a lot of support for such meters for iOS, but mostly for AVAudioPlayer, which is no option for me. I am using AVCaptureSession to record, and will then end up with the delegate method shown below:

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);

    CFRetain(sampleBuffer);
    CFRetain(formatDescription);

    if(connection == audioConnection)
    {
        CMBlockBufferRef blockBuffer;
        AudioBufferList audioBufferList;

        CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, 
            NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL,
            kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
            &blockBuffer);

        SInt16 *data = audioBufferList.mBuffers[0].mData;
    }
    //Releases etc..
}

(Only showing relevant code)

Of what I understand, I receive a 'sample buffer', containing either audio or video. Once I've verified that the connection indeed is audio, then I 'extract' the audioBufferList from the buffer, and I am sitting here left with a list of one (or more?) audioBuffers. The actual data is, as I understand, represented as SInt16, or '16 bits signed integer', which as far as I understand has a range from -32,768 to 32,767. However, if I simply print out this received value, I get A LOT of bouncing numbers. When in "silence" I get values bouncing rapidly between -200 and 200, and when there's noise I get values from -4,000 to 13,000, completely out of order. As I've understood from reading, the value 0 will represent silence. However, I do not understand the difference between negative and positive values, as well as I do not know if the are able to reach all the way up/down to +-32,768.

I believe I need a percentage of how 'loud' it is, but have been unable to find anything.

I have read a couple of tutorials and references on the matter, but nothing makes sense to me. I followed one guide by doing this(appending to the code above, inside the if):

float accumulator = 0;
for(int i = 0; i < audioBufferList.mBuffers[0].mDataByteSize; i++)
    accumulator += data[i] * data[i];
float power = accumulator / audioBufferList.mBuffers[0].mDataByteSize;
float decibels = log10f(power);
NSLog(@"%f", decibels);

Apparently, this code was supposed to align from -1 to +1, but that did not happen. I am now getting values around 6.194681 when silence, and 7.773492 for some noise. This is feels like the correct 'range', but in the 'wrong place'. I can't simply subtract 7 from the number and assume I'm between -1 and +1. There should be some logic and science behind how this should work, but I do not know enough about how digital audio works.

Does anyone know the logic behind this? Is 0 always silence while -32,768 and 32,767 are loud noises? Can I then simply multiply all negative values by -1 to always get positive values, and then find out how many percent they are at (between 0 and 32767)? Somehow, I don't believe this will work, as I guess there is a reason for the negative values.. I'm not completely sure what to try.

like image 285
Sti Avatar asked Oct 31 '22 19:10

Sti


1 Answers

The code in your question is wrong in several ways. This code is trying to copy that from the article below, but you've not handled it properly converting from the float-based code in the article to 16-bit integer math. You're also looping on the wrong number of values (max i) and will end up pulling in garbage data. So this is all kinds of wrong.

https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html

The code in the article is correct. Here's what it is, expanded a bit. This is only looking at the first buffer in a 32-bit float buffer list.

float accumulator = 0;
AudioBuffer buffer = bufferList->mBuffers[0];
float * data = (float *)buffer.mData;
UInt32 numSamples = buffer.mDataByteSize / sizeof(float);

for (UInt32 i = 0; i < numSamples; i++) {
    accumulator += data[i] * data[i];
}
float power = accumulator / (float)numSamples;
float decibels = 10 * log10f(power);

As the article says, the result here is decibels uses 0dB reference. eg, 0.0 is the maximum value. This is the same thing that AVAudioPlayer's averagePowerForChannel returns for example.

To use this in your 16-bit integer context, you'd need to a) loop appropriately through each 16-bit sample, b) convert the data[i] value from a 16-bit integer to a floating point value in the [-1.0, 1.0] range before squaring and adding to the accumulator.

like image 138
seth Avatar answered Nov 15 '22 05:11

seth