I am looking to extract sample-accurate ranges of LPCM audio from audio tracks within video files. Currently, I'm looking to achieve this using AVAssetReaderTrackOutput
against an AVAssetTrack
delivered from reading a AVURLAsset
.
Despite preparing and ensuring the asset is initialized using AVURLAssetPreferPreciseDurationAndTimingKey
set to YES
, seeking to a sample-accurate position within an asset seems to be inaccurate.
NSDictionary *options = @{ AVURLAssetPreferPreciseDurationAndTimingKey : @(YES) };
_asset = [[AVURLAsset alloc] initWithURL:fileURL options:options];
This manifests itself with e.g. variable bit-rate encoded AAC streams. While I know that VBR audio streams present a performance overhead in seeking accurately, I'm willing to pay this provided I am delivered accurate samples.
When using e.g. Extended Audio File Services and the ExtAudioFileRef
APIs, I can achieve sample-accurate seeks and extraction of audio. Likewise with AVAudioFile
, as this builds on top of ExtAudioFileRef
.
The issue, however, is I would also like to extract audio from media containers that the audio-file only APIs reject, but which are supported in AVFoundation via AVURLAsset
.
A sample accurate time range for extraction is defined using CMTime
and CMTimeRange
, and set on the AVAssetReaderTrackOutput
. Samples are then iteratively extracted.
-(NSData *)readFromFrame:(SInt64)startFrame
requestedFrameCount:(UInt32)frameCount
{
NSUInteger expectedByteCount = frameCount * _bytesPerFrame;
NSMutableData *data = [NSMutableData dataWithCapacity:expectedByteCount];
//
// Configure Output
//
NSDictionary *settings = @{ AVFormatIDKey : @( kAudioFormatLinearPCM ),
AVLinearPCMIsNonInterleaved : @( NO ),
AVLinearPCMIsBigEndianKey : @( NO ),
AVLinearPCMIsFloatKey : @( YES ),
AVLinearPCMBitDepthKey : @( 32 ),
AVNumberOfChannelsKey : @( 2 ) };
AVAssetReaderOutput *output = [[AVAssetReaderTrackOutput alloc] initWithTrack:_track outputSettings:settings];
CMTime startTime = CMTimeMake( startFrame, _sampleRate );
CMTime durationTime = CMTimeMake( frameCount, _sampleRate );
CMTimeRange range = CMTimeRangeMake( startTime, durationTime );
//
// Configure Reader
//
NSError *error = nil;
AVAssetReader *reader = [[AVAssetReader alloc] initWithAsset:_asset error:&error];
if( !reader )
{
fprintf( stderr, "avf : failed to initialize reader\n" );
fprintf( stderr, "avf : %s\n%s\n", error.localizedDescription.UTF8String, error.localizedFailureReason.UTF8String );
exit( EXIT_FAILURE );
}
[reader addOutput:output];
[reader setTimeRange:range];
BOOL startOK = [reader startReading];
NSAssert( startOK && reader.status == AVAssetReaderStatusReading, @"Ensure we've started reading." );
NSAssert( _asset.providesPreciseDurationAndTiming, @"We expect the asset to provide accurate timing." );
//
// Start reading samples
//
CMSampleBufferRef sample = NULL;
while(( sample = [output copyNextSampleBuffer] ))
{
CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp( sample );
if( data.length == 0 )
{
// First read - we should be at the expected presentation time requested.
int32_t comparisonResult = CMTimeCompare( presentationTime, startTime );
NSAssert( comparisonResult == 0, @"We expect sample accurate seeking" );
}
CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer( sample );
if( !buffer )
{
fprintf( stderr, "avf : failed to obtain buffer" );
exit( EXIT_FAILURE );
}
size_t lengthAtOffset = 0;
size_t totalLength = 0;
char *bufferData = NULL;
if( CMBlockBufferGetDataPointer( buffer, 0, &lengthAtOffset, &totalLength, &bufferData ) != kCMBlockBufferNoErr )
{
fprintf( stderr, "avf : failed to get sample\n" );
exit( EXIT_FAILURE );
}
if( bufferData && lengthAtOffset )
{
[data appendBytes:bufferData length:lengthAtOffset];
}
CFRelease( sample );
}
NSAssert( reader.status == AVAssetReaderStatusCompleted, @"Completed reading" );
[output release];
[reader release];
return [NSData dataWithData:data];
}
The presentation time that CMSampleBufferGetPresentationTimeStamp
gives me seems to match what I sought after - but as it seems inaccurate, then I have no chance to correct and align the samples I retrieve.
Any thoughts on how to do this?
Alternatively, is there a way to adapt AVAssetTrack
to be used by AVAudioFile
or ExtAudioFile
?
Is it possible to access the audio track via AudioFileOpenWithCallbacks
?
Is it possible to get at the audio stream from a video container in a different manner in macOS?
One procedure that works is to use AVAssetReader, to read your compressed AV file, in conjunction with AVAssetWriter, to write a new raw LPCM file of the audio samples. Then one can quickly index through this new PCM file (or memory mapped array, if necessary) to extract exact sample-accurate ranges, without incurring VBR per-packet decoding size anomalies or depending on iOS CMTimeStamp algorithms outside one's control.
This may not be the most time or memory efficient procedure, but it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With