I'm struggling to encode audio buffers received from AVCaptureSession
using
AudioConverter
and then appending them to an AVAssetWriter
.
I'm not getting any errors (including OSStatus responses), and the
CMSampleBuffer
s generated seem to have valid data, however the resulting file
simply does not have any playable audio. When writing together with video, the video
frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error
), probably because the asset
writer is waiting for the audio to catch up. I suspect it's related to the way
I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
Below is the proc that fetches samples for the audio converter, and the data structure that gets passed to it:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
This is how I am converting AudioBufferList
s to CMSampleBuffer
s:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
Is there anything that I am obviously doing wrong? Is there a proper way to
construct CMSampleBuffer
s from AudioBufferList
? How do you transfer priming
information from the converter to CMSampleBuffer
s that you create?
For my use case I need to do the encoding manually as the buffers will be manipulated further down the pipeline (although I've disabled all transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
Some references I've used:
AudioConverter
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
This had been a huge source of confusion for me:
CMSampleBuffer
can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples
)CMSampleBuffer
that contains 1 sample represents a single audio packet.CMSampleBufferGetNumSamples(sample)
will return the number of packets contained in the given buffer.mFramesPerPacket
property of the buffer's AudioStreamBasicDescription
. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame
. For compressed buffers (like AAC), there is no relationship between the total size and frame count.AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets
. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
CMSampleBuffer
'sYou can actually slice CMSampleBuffer
's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange
. This is nice so that you can provide the AudioConverterComplexInputDataProc
with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500
frames of data when it expects 1024
, the result sample buffer will have a duration of 1024/sampleRate
as opposed to 1500/sampleRate
.
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart
should never be greater than the duration of the sample buffer. An example of priming:
2112
44100
2112 / 44100 = ~0.0479s
1024
, priming duration: 1024 / 44100
1024
, priming duration: 1088 / 41100
CMSampleBuffer
AudioConverterFillComplexBuffer
has an optional outputPacketDescriptionsPtr
. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With