Image buffer display order with VTDecompressionSession

Question

I have a project where I need to decode h264 video from a live network stream and eventually end up with a texture I can display in another framework (Unity3D) on iOS devices. I can successfully decode the video using VTDecompressionSession and then grab the texture with CVMetalTextureCacheCreateTextureFromImage (or the OpenGL variant). It works great when I use a low-latency encoder and the image buffers come out in display order, however, when I use the regular encoder the image buffers do not come out in display order and reordering the image buffers is apparently far more difficult that I expected.

The first attempt was to set the VTDecodeFrameFlags with kVTDecodeFrame_EnableAsynchronousDecompression and kVTDecodeFrame_EnableTemporalProcessing... However, it turns out that VTDecompressionSession can choose to ignore the flag and do whatever it wants... and in my case, it chooses to ignore the flag and still outputs the buffer in encoder order (not display order). Essentially useless.

The next attempt was to associate the image buffers with the presentation time stamp and then throw them into a vector which would allow me to grab the image buffer I needed when I create the texture. The problem seems to be that the image buffer that goes into the VTDecompressionSession, which is associated with a time stamp, is no longer the same buffer that comes out, essentially making the time stamp useless.

For example, going into the decoder...

  VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression;
  VTDecodeInfoFlags flagOut;
  // Presentation time stamp to be passed with the buffer
  NSNumber *nsPts = [NSNumber numberWithDouble:pts];

  VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags,
                                          (void*)CFBridgingRetain(nsPts), &flagOut);

On the callback side...

void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration)
 {
      // The presentation time stamp...
      // No longer seems to be associated with the buffer that it went in with!
      NSNumber* pts = CFBridgingRelease(sourceFrameRefCon);
 }

When ordered, the time stamps on the callback side increase monotonically at the expected rate, but the buffers are not in the right order. Does anyone see where I am making an error here? Or know how to determine the order of the buffers on the callback side? At this point I have tried just about everything I can think of.

Kaleb · Accepted Answer

In my case, the problem wasn't with VTDecompressionSession, it was a problem with the demuxer getting the wrong PTS. While I couldn't get VTDecompressionSession to put out the frames in temporal (display) order with the kVTDecodeFrame_EnableAsynchronousDecompression and kVTDecodeFrame_EnableTemporalProcessing flags, I could sort the frames myself based on PTS with a small vector.

First, make sure you associate all of your timing information with your CMSampleBuffer along with the block buffer so you receive it in the VTDecompressionSession callback.

// Wrap our CMBlockBuffer in a CMSampleBuffer...
CMSampleBufferRef sampleBuffer;

CMTime duration = ...;
CMTime presentationTimeStamp = ...;
CMTime decompressTimeStamp = ...;

CMSampleTimingInfo timingInfo{duration, presentationTimeStamp, decompressTimeStamp};

_sampleTimingArray[0] = timingInfo;
_sampleSizeArray[0] = nalLength;

// Wrap the CMBlockBuffer...
status = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, NULL, _formatDescription, 1, 1, _sampleTimingArray, 1, _sampleSizeArray, &sampleBuffer);

Then, decode the frame... It is worth trying to get the frames out in display order with the flags.

VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression | kVTDecodeFrame_EnableTemporalProcessing;
VTDecodeInfoFlags flagOut;

VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags,
                                      (void*)CFBridgingRetain(NULL), &flagOut);

On the callback side of things, we need a way of sorting the CVImageBufferRefs we receive. I use a struct that contains the CVImageBufferRef and the PTS. Then a vector with a size of two that will do the actual sorting.

struct Buffer
{
    CVImageBufferRef imageBuffer = NULL;
    double pts = 0;
};

std::vector <Buffer> _buffer;

We also need a way to sort the Buffers. Always writing to and reading from the index with the lowest PTS works well.

 -(int) getMinIndex
 {
     if(_buffer[0].pts > _buffer[1].pts)
     {
         return 1;
     }

     return 0;
 }

In the callback, we need to fill the vector with Buffers...

 void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration)
 {
    StreamManager *streamManager = (__bridge StreamManager     *)decompressionOutputRefCon;

    @synchronized(streamManager)
    {
    if (status != noErr)
    {
        NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil];
        NSLog(@"Decompressed error: %@", error);
    }
    else
    {
        // Get the PTS
        double pts = CMTimeGetSeconds(presentationTimeStamp);

        // Fill our buffer initially
        if(!streamManager->_bufferReady)
        {
            Buffer buffer;

            buffer.pts = pts;
            buffer.imageBuffer = imageBuffer;

            CVBufferRetain(buffer.imageBuffer);

            streamManager->_buffer[streamManager->_bufferIndex++] = buffer;
        }
        else
        {
            // Push new buffers to the index with the lowest PTS
            int index = [streamManager getMinIndex];

            // Release the old CVImageBufferRef
            CVBufferRelease(streamManager->_buffer[index].imageBuffer);

            Buffer buffer;

            buffer.pts = pts;
            buffer.imageBuffer = imageBuffer;

            // Retain the new CVImageBufferRef
            CVBufferRetain(buffer.imageBuffer);

            streamManager->_buffer[index] = buffer;
        }

        // Wrap around the buffer when initialized
        // _bufferWindow = 2
        if(streamManager->_bufferIndex == streamManager->_bufferWindow)
        {
            streamManager->_bufferReady = YES;
            streamManager->_bufferIndex = 0;
        }
    }
}
}

Finally we need to drain the Buffers in temporal (display) order...

 - (void)drainBuffer
 {
      @synchronized(self)
      {
         if(_bufferReady)
         {
             // Drain buffers from the index with the lowest PTS
             int index = [self getMinIndex];

             Buffer buffer = _buffer[index];

             // Do something useful with the buffer now in display order
         }
       }
 }

maddanio · Answer

I would like to improve upon that answer a bit. While the outlined solution works, it requires knowledge of the number of frames needed to produce an output frame. The example uses a buffer size of 2, but in my case I needed a buffer size of 3. To avoid having to specify this in advance one can make use of the fact, that frames (in display order) align exactly in terms of pts/duration. I.e. the end of one frame is exactly the beginning of the next. Thus one can simply accumulate frames until there is no "gap" at the beginning, then pop the first frame, and so on. Also one can take the pts of the first frame (which is always an I-frame) as the initial "head" (as it does not have to be zero...). Here is some code that does this:

#include <CoreVideo/CVImageBuffer.h>

#include <boost/container/flat_set.hpp>

inline bool operator<(const CMTime& left, const CMTime& right)
{
    return CMTimeCompare(left, right) < 0;
}

inline bool operator==(const CMTime& left, const CMTime& right)
{
    return CMTimeCompare(left, right) == 0;
}

inline CMTime operator+(const CMTime& left, const CMTime& right)
{
    return CMTimeAdd(left, right);
}

class reorder_buffer_t
{
public:

    struct entry_t
    {
        CFGuard<CVImageBufferRef> image;
        CMTime pts;
        CMTime duration;
        bool operator<(const entry_t& other) const
        {
            return pts < other.pts;
        }
    };

private:

    typedef boost::container::flat_set<entry_t> buffer_t;

public:

    reorder_buffer_t()
    {
    }

    void push(entry_t entry)
    {
        if (!_head)
            _head = entry.pts;
        _buffer.insert(std::move(entry));
    }

    bool empty() const
    {
        return _buffer.empty();
    }

    bool ready() const
    {
        return !empty() && _buffer.begin()->pts == _head;
    }

    entry_t pop()
    {
        assert(ready());
        auto entry = *_buffer.begin();
        _buffer.erase(_buffer.begin());
        _head = entry.pts + entry.duration;
        return entry;
    }

    void clear()
    {
        _buffer.clear();
        _head = boost::none;
    }

private:

    boost::optional<CMTime> _head;
    buffer_t _buffer;
};

Declan Stewart McPartlin · Answer

Here's a solution that works with any required buffer size, and also does not need any 3rd party libraries. My C++ code might not be the best, but it works.

We create a Buffer struct to identify the buffers by pts:

struct Buffer
{
    CVImageBufferRef imageBuffer = NULL;
    uint64_t pts = 0;
};

In our decoder, we need to keep track of the buffers, and what pts we want to release next:

@property (nonatomic) std::vector <Buffer> buffers;
@property (nonatomic, assign) uint64_t nextExpectedPts;

Now we are ready to handle the buffers coming in. In my case the buffers were provided asynchronously. Make sure you provide the correct duration and presentation timestamp values to the decompressionsession to be able to sort them properly:

-(void)handleImageBuffer:(CVImageBufferRef)imageBuffer pts:(CMTime)presentationTimeStamp duration:(uint64_t)duration {
    //Situation 1, we can directly pass over this buffer
    if (self.nextExpectedPts == presentationTimeStamp.value || duration == 0) {
        [self sendImageBuffer:imageBuffer duration:duration];
        return;
    }
    //Situation 2, we got this buffer too fast. We will store it, but first we check if we have already stored the expected buffer
    Buffer futureBuffer = [self bufferWithImageBuffer:imageBuffer pts:presentationTimeStamp.value];
    int smallestPtsInBufferIndex = [self getSmallestPtsBufferIndex];
    if (smallestPtsInBufferIndex >= 0 && self.nextExpectedPts == self.buffers[smallestPtsInBufferIndex].pts) {
        //We found the next buffer, lets store the current buffer and return this one
        Buffer bufferWithSmallestPts = self.buffers[smallestPtsInBufferIndex];
        [self sendImageBuffer:bufferWithSmallestPts.imageBuffer duration:duration];
        CVBufferRelease(bufferWithSmallestPts.imageBuffer);
        [self setBuffer:futureBuffer atIndex:smallestPtsInBufferIndex];
    } else {
        //We dont have the next buffer yet, lets store this one to a new slot
        [self setBuffer:futureBuffer atIndex:self.buffers.size()];
    }
}

-(Buffer)bufferWithImageBuffer:(CVImageBufferRef)imageBuffer pts:(uint64_t)pts {
    Buffer futureBuffer = Buffer();
    futureBuffer.pts = pts;
    futureBuffer.imageBuffer = imageBuffer;
    CVBufferRetain(futureBuffer.imageBuffer);
    return futureBuffer;
}

- (void)sendImageBuffer:(CVImageBufferRef)imageBuffer duration:(uint64_t)duration {
    //Send your buffer to wherever you need it here
    self.nextExpectedPts += duration;
}

-(int) getSmallestPtsBufferIndex
{
    int minIndex = -1;
    uint64_t minPts = 0;
    for(int i=0;i<_buffers.size();i++) {
        if (_buffers[i].pts < minPts || minPts == 0) {
            minPts = _buffers[i].pts;
            minIndex = i;
        }
    }
    return minIndex;
}

- (void)setBuffer:(Buffer)buffer atIndex:(int)index {
    if (_buffers.size() <= index) {
        _buffers.push_back(buffer);
    } else {
        _buffers[index] = buffer;
    }
}

Do not forget to release all the buffers in the vector when deallocating your decoder, and if you're working with a looping file for example, keep track of when the file has fully looped to reset the nextExpectedPts and such.

Image buffer display order with VTDecompressionSession

Tags:

ios

video-toolbox

core-video

Kaleb

3 Answers

Kaleb

maddanio

Declan Stewart McPartlin

Recent Activity

Donate For Us

Image buffer display order with VTDecompressionSession

Tags:

ios

video-toolbox

core-video

Kaleb

3 Answers

Kaleb

maddanio

Declan Stewart McPartlin

Related questions

Recent Activity

Donate For Us