I'm working on an iOS app that does two things at the same time:
Unfortunately, when we configure the app with an audio unit to enable echo cancellation, the recording functionality breaks: the AVAssetWriterInput
instance we're using to encode audio rejects incoming samples. When we don't set up the audio unit, recording works, but we have terrible echo.
To enable echo cancellation, we configure an audio unit like this (paraphrasing for the sake of brevity):
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
AudioComponent comp = AudioComponentFindNext(NULL, &desc);
OSStatus status = AudioComponentInstanceNew(comp, &_audioUnit);
status = AudioUnitInitialize(_audioUnit);
This works fine for video chat, but it breaks the recording functionality, which is set up like this (again, paraphrasing—the actual implementation is spread out over several methods).
_captureSession = [[AVCaptureSession alloc] init];
// Need to use the existing audio session & configuration to ensure we get echo cancellation
_captureSession.usesApplicationAudioSession = YES;
_captureSession.automaticallyConfiguresApplicationAudioSession = NO;
[_captureSession beginConfiguration];
AVCaptureDeviceInput *audioInput = [[AVCaptureDeviceInput alloc] initWithDevice:[self audioCaptureDevice] error:NULL];
[_captureSession addInput:audioInput];
_audioDataOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioDataOutput setSampleBufferDelegate:self queue:_cameraProcessingQueue];
[_captureSession addOutput:_audioDataOutput];
[_captureSession commitConfiguration];
And the relevant portion of captureOutput
looks something like this:
NSLog(@"Audio format, channels: %d, sample rate: %f, format id: %d, bits per channel: %d", basicFormat->mChannelsPerFrame, basicFormat->mSampleRate, basicFormat->mFormatID, basicFormat->mBitsPerChannel);
if (_assetWriter.status == AVAssetWriterStatusWriting) {
if (_audioEncoder.readyForMoreMediaData) {
if (![_audioEncoder appendSampleBuffer:sampleBuffer]) {
NSLog(@"Audio encoder couldn't append sample buffer");
What happens is the call to appendSampleBuffer
fails, but—and this is the strange part—only if I don't have earphones plugged into my phone. Examining the logs produced when this happens, I found that without earphones connected, the number of channels reported in the log message was 3, whereas with earphones connected, the number of channels was 1. This explains why the encode operation was failing, since the encoder was configured to expect just a single channel.
What I don't understand is why I'm getting three channels here. If I comment out the code that initializes the audio unit, I only get a single channel and recording works fine, but echo cancellation doesn't work. Furthermore, if I remove these lines
// Need to use the existing audio session & configuration to ensure we get echo cancellation
_captureSession.usesApplicationAudioSession = YES;
_captureSession.automaticallyConfiguresApplicationAudioSession = NO;
recording works (I only get a single channel with or without headphones), but again, we lose echo cancellation.
So, the crux of my question is: why am I getting three channels of audio when I configure an audio unit to provide echo cancellation? Furthermore, is there any way to prevent this from happening or to work around this behavior using AVCaptureSession
I've considered piping the microphone audio directly from the low-level audio unit callback into the encoder, as well as to the chat pipeline, but it seems like conjuring up the necessary Core Media buffers to do so would be a bit of work that I'd like to avoid if possible.
Note that the chat and recording functions were written by different people—neither of them me—which is the reason this code isn't more integrated. If possible, I'd like to avoid having to refactor the whole mess.
Ultimately, I was able to work around this issue by gathering audio samples from the microphone via the I/O audio unit, repackaging these samples into a CMSampleBuffer
, and passing the newly constructed CMSampleBuffer
into the encoder.
The code to do the conversion looks like this (abbreviated for the sake of brevity).
// Create a CMSampleBufferRef from the list of samples, which we'll own
AudioStreamBasicDescription monoStreamFormat;
memset(&monoStreamFormat, 0, sizeof(monoStreamFormat));
monoStreamFormat.mSampleRate = 48000;
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
monoStreamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved;
monoStreamFormat.mBytesPerPacket = 2;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerFrame = 2;
monoStreamFormat.mChannelsPerFrame = 1;
monoStreamFormat.mBitsPerChannel = 16;
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0, NULL, 0, NULL, NULL, &format);
// Convert the AudioTimestamp to a CMTime and create a CMTimingInfo for this set of samples
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 48000), presentationTime, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
// Pass the buffer into the encoder...
Please note that I've removed error handling and cleanup of the allocated objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With