Sample accurate extraction of chunks of audio using AVFoundation

Question

Problem

I am looking to extract sample-accurate ranges of LPCM audio from audio tracks within video files. Currently, I'm looking to achieve this using AVAssetReaderTrackOutput against an AVAssetTrack delivered from reading a AVURLAsset.

Despite preparing and ensuring the asset is initialized using AVURLAssetPreferPreciseDurationAndTimingKey set to YES, seeking to a sample-accurate position within an asset seems to be inaccurate.

NSDictionary *options = @{ AVURLAssetPreferPreciseDurationAndTimingKey : @(YES) };
_asset = [[AVURLAsset alloc] initWithURL:fileURL options:options];

This manifests itself with e.g. variable bit-rate encoded AAC streams. While I know that VBR audio streams present a performance overhead in seeking accurately, I'm willing to pay this provided I am delivered accurate samples.

When using e.g. Extended Audio File Services and the ExtAudioFileRef APIs, I can achieve sample-accurate seeks and extraction of audio. Likewise with AVAudioFile, as this builds on top of ExtAudioFileRef.

The issue, however, is I would also like to extract audio from media containers that the audio-file only APIs reject, but which are supported in AVFoundation via AVURLAsset.

Method

A sample accurate time range for extraction is defined using CMTime and CMTimeRange, and set on the AVAssetReaderTrackOutput. Samples are then iteratively extracted.

-(NSData *)readFromFrame:(SInt64)startFrame
      requestedFrameCount:(UInt32)frameCount
{
    NSUInteger expectedByteCount = frameCount * _bytesPerFrame;
    NSMutableData *data = [NSMutableData dataWithCapacity:expectedByteCount];
    
    //
    // Configure Output
    //

    NSDictionary *settings = @{ AVFormatIDKey               : @( kAudioFormatLinearPCM ),
                                AVLinearPCMIsNonInterleaved : @( NO ),
                                AVLinearPCMIsBigEndianKey   : @( NO ),
                                AVLinearPCMIsFloatKey       : @( YES ),
                                AVLinearPCMBitDepthKey      : @( 32 ),
                                AVNumberOfChannelsKey       : @( 2 ) };

    AVAssetReaderOutput *output = [[AVAssetReaderTrackOutput alloc] initWithTrack:_track outputSettings:settings];

    CMTime startTime    = CMTimeMake( startFrame, _sampleRate );
    CMTime durationTime = CMTimeMake( frameCount, _sampleRate );
    CMTimeRange range   = CMTimeRangeMake( startTime, durationTime );

    //
    // Configure Reader
    //

    NSError *error = nil;
    AVAssetReader *reader = [[AVAssetReader alloc] initWithAsset:_asset error:&error];

    if( !reader )
    {
        fprintf( stderr, "avf : failed to initialize reader
" );
        fprintf( stderr, "avf : %s
%s
", error.localizedDescription.UTF8String, error.localizedFailureReason.UTF8String );
        exit( EXIT_FAILURE );
    }

    [reader addOutput:output];
    [reader setTimeRange:range];
    BOOL startOK = [reader startReading];

    NSAssert( startOK && reader.status == AVAssetReaderStatusReading, @"Ensure we've started reading." );

    NSAssert( _asset.providesPreciseDurationAndTiming, @"We expect the asset to provide accurate timing." );

    //
    // Start reading samples
    //

    CMSampleBufferRef sample = NULL;
    while(( sample = [output copyNextSampleBuffer] ))
    {
        CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp( sample );
        if( data.length == 0 )
        {
            // First read - we should be at the expected presentation time requested.
            int32_t comparisonResult = CMTimeCompare( presentationTime, startTime );
            NSAssert( comparisonResult == 0, @"We expect sample accurate seeking" );
        }

        CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer( sample );

        if( !buffer )
        {
            fprintf( stderr, "avf : failed to obtain buffer" );
            exit( EXIT_FAILURE );
        }

        size_t lengthAtOffset = 0;
        size_t totalLength = 0;
        char *bufferData = NULL;

        if( CMBlockBufferGetDataPointer( buffer, 0, &lengthAtOffset, &totalLength, &bufferData ) != kCMBlockBufferNoErr )
        {
            fprintf( stderr, "avf : failed to get sample
" );
            exit( EXIT_FAILURE );
        }

        if( bufferData && lengthAtOffset )
        {
            [data appendBytes:bufferData length:lengthAtOffset];
        }

        CFRelease( sample );
    }

    NSAssert( reader.status == AVAssetReaderStatusCompleted, @"Completed reading" );

    [output release];
    [reader release];

    return [NSData dataWithData:data];
}

Notes

The presentation time that CMSampleBufferGetPresentationTimeStamp gives me seems to match what I sought after - but as it seems inaccurate, then I have no chance to correct and align the samples I retrieve.

Any thoughts on how to do this?

Alternatively, is there a way to adapt AVAssetTrack to be used by AVAudioFile or ExtAudioFile?

Is it possible to access the audio track via AudioFileOpenWithCallbacks?

Is it possible to get at the audio stream from a video container in a different manner in macOS?

hotpaw2 · Accepted Answer

One procedure that works is to use AVAssetReader, to read your compressed AV file, in conjunction with AVAssetWriter, to write a new raw LPCM file of the audio samples. Then one can quickly index through this new PCM file (or memory mapped array, if necessary) to extract exact sample-accurate ranges, without incurring VBR per-packet decoding size anomalies or depending on iOS CMTimeStamp algorithms outside one's control.

This may not be the most time or memory efficient procedure, but it works.

Sample accurate extraction of chunks of audio using AVFoundation

Tags:

avfoundation

core-audio

audio

avasset

avassetreader

Problem

Method

Notes

Dan

1 Answers

hotpaw2

Recent Activity

Donate For Us

Sample accurate extraction of chunks of audio using AVFoundation

Tags:

avfoundation

core-audio

audio

avasset

avassetreader

Problem

Method

Notes

Dan

1 Answers

hotpaw2

Related questions

Recent Activity

Donate For Us