Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing AVCaptureAudioDataOutput data into vDSP / Accelerate.framework

I am trying to create an application which runs a FFT on microphone data, so I can examine e.g. the loudest frequency in the input.

I see that there are many methods of getting audio input (the RemoteIO AudioUnit, AudioQueue services, and AVFoundation) but it seems like AVFoundation is the simplest. I have this setup:

// Configure the audio session
AVAudioSession *session = [AVAudioSession sharedInstance];
[session setCategory:AVAudioSessionCategoryRecord error:NULL];
[session setMode:AVAudioSessionModeMeasurement error:NULL];
[session setActive:YES error:NULL];

// Optional - default gives 1024 samples at 44.1kHz
//[session setPreferredIOBufferDuration:samplesPerSlice/session.sampleRate error:NULL];

// Configure the capture session (strongly-referenced instance variable, otherwise the capture stops after one slice)
_captureSession = [[AVCaptureSession alloc] init];

// Configure audio device input
AVCaptureDevice *device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
AVCaptureDeviceInput *input = [AVCaptureDeviceInput deviceInputWithDevice:device error:NULL];
[_captureSession addInput:input];

// Configure audio data output
AVCaptureAudioDataOutput *output = [[AVCaptureAudioDataOutput alloc] init];
dispatch_queue_t queue = dispatch_queue_create("My callback", DISPATCH_QUEUE_SERIAL);
[output setSampleBufferDelegate:self queue:queue];
[_captureSession addOutput:output];

// Start the capture session.   
[_captureSession startRunning];

(plus error checking, omitted here for readability).

Then I implement the following AVCaptureAudioDataOutputSampleBufferDelegate method:

- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
    NSLog(@"Num samples: %ld", CMSampleBufferGetNumSamples(sampleBuffer));
    // Usually gives 1024 (except the first slice)
}

I'm unsure what the next step should be. What exactly does the CMSampleBuffer format describe (and what assumptions can be made about it, if any)? How should I get the raw audio data into vDSP_fft_zrip with the least possible amount of extra preprocessing? (Also, what would you recommend doing to verify that the raw data I see is correct?)

like image 257
jtbandes Avatar asked Dec 30 '12 04:12

jtbandes


2 Answers

The CMSampleBufferRef is an opaque type that contains 0 or more media samples. There is a bit of blurb in the docs:

http://developer.apple.com/library/ios/#documentation/CoreMedia/Reference/CMSampleBuffer/Reference/reference.html

In this case it will contain an audio buffer, as well as the description of the sample format and timing information and so on. If you are really interested just put a breakpoint in the delegate callback and take a look.

The first step is to get a pointer to the data buffer that has been returned:

// get a pointer to the audio bytes
CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer);
CMBlockBufferRef audioBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t lengthAtOffset;
size_t totalLength;
char *samples;
CMBlockBufferGetDataPointer(audioBuffer, 0, &lengthAtOffset, &totalLength, &samples);

The default sample format for the iPhone mic is linear PCM, with 16 bit samples. This may be mono or stereo depending on if there is an external mic or not. To calculate the FFT we need to have a float vector. Fortunately there is an accelerate function to do the conversion for us:

// check what sample format we have
// this should always be linear PCM
// but may have 1 or 2 channels
CMAudioFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
const AudioStreamBasicDescription *desc = CMAudioFormatDescriptionGetStreamBasicDescription(format);
assert(desc->mFormatID == kAudioFormatLinearPCM);
if (desc->mChannelsPerFrame == 1 && desc->mBitsPerChannel == 16) {
    float *convertedSamples = malloc(numSamples * sizeof(float));
    vDSP_vflt16((short *)samples, 1, convertedSamples, 1, numSamples);
} else {
    // handle other cases as required
}

Now you have a float vector of the sample buffer which you can use with vDSP_fft_zrip. It doesn't seem possible to change the input format from the microphone to float samples with AVFoundation, so you are stuck with this last conversion step. I would keep around the buffers in practice, reallocing them if necessary when a larger buffer arrives, so that you are not mallocing and freeing buffers with every delegate callback.

As for your last question, I guess the easiest way to do this would be to inject a known input and check that it gives you the correct response. You could play a sine wave into the mic and check that your FFT had a peak in the correct frequency bin, something like that.

like image 187
Tark Avatar answered Sep 18 '22 04:09

Tark


I don't suggest to use AVFoundation for 3 reasons:

  1. I used it for some of mine apps (morsedec , irtty), it works well on simulator and in some hardware, but in others totally failed !
  2. you do not have good control of sample rate an format.
  3. latency could be high.

I suggest to start with apple's sample code aurioTouch. To make FFT you can shift to vDSP framework using a circular buffer (I LOVE https://github.com/michaeltyson/TPCircularBuffer).

Hope this help

like image 34
jackdev23 Avatar answered Sep 18 '22 04:09

jackdev23