How to get the Y component from CMSampleBuffer resulted from the AVCaptureSession?

Question

Hey there, I am trying to access raw data from iphone camera using AVCaptureSession. I follow the guide provided by Apple (link here).

The raw data from the samplebuffer is in YUV format ( Am I correct here about the raw video frame format?? ), how to directly obtain the data for Y component out of the raw data stored in the samplebuffer.

Brad Larson · Accepted Answer

When setting up the AVCaptureVideoDataOutput that returns the raw camera frames, you can set the format of the frames using code like the following:

[videoOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey]];

In this case a BGRA pixel format is specified (I used this for matching a color format for an OpenGL ES texture). Each pixel in that format has one byte for blue, green, red, and alpha, in that order. Going with this makes it easy to pull out color components, but you do sacrifice a little performance by needing to make the conversion from the camera-native YUV colorspace.

Other supported colorspaces are kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and kCVPixelFormatType_420YpCbCr8BiPlanarFullRange on newer devices and kCVPixelFormatType_422YpCbCr8 on the iPhone 3G. The VideoRange or FullRange suffix simply indicates whether the bytes are returned between 16 - 235 for Y and 16 - 240 for UV or full 0 - 255 for each component.

I believe the default colorspace used by an AVCaptureVideoDataOutput instance is the YUV 4:2:0 planar colorspace (except on the iPhone 3G, where it's YUV 4:2:2 interleaved). This means that there are two planes of image data contained within the video frame, with the Y plane coming first. For every pixel in your resulting image, there is one byte for the Y value at that pixel.

You would get at this raw Y data by implementing something like this in your delegate callback:

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(pixelBuffer, 0);

    unsigned char *rawPixelBase = (unsigned char *)CVPixelBufferGetBaseAddress(pixelBuffer);

    // Do something with the raw pixels here

    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
}

You could then figure out the location in the frame data for each X, Y coordinate on the image and pull the byte out that corresponds to the Y component at that coordinate.

Apple's FindMyiCone sample from WWDC 2010 (accessible along with the videos) shows how to process raw BGRA data from each frame. I also created a sample application, which you can download the code for here, that performs color-based object tracking using the live video from the iPhone's camera. Both show how to process raw pixel data, but neither of these work in the YUV colorspace.

Codo · Answer

In addition to Brad's answer, and your own code, you want to consider the following:

Since your image has two separate planes, the function CVPixelBufferGetBaseAddress will not return the base address of the plane but rather the base address of an additional data structure. It's probably due to the current implementation that you get an address close enough to the first plane so that you can see the image. But it's the reason it's shifted and has garbage at the top left. The correct way to receive the first plane is:

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);

A row in the image might be longer than the width of the image (due to rounding). That's why there are separate functions for getting the width and the number of bytes per row. You don't have this problem at the moment. But that might change with the next version of iOS. So your code should be:

int bufferHeight = CVPixelBufferGetHeight(pixelBuffer);
int bufferWidth = CVPixelBufferGetWidth(pixelBuffer);
int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0);
int size = bufferHeight * bytesPerRow ;

unsigned char *pixel = (unsigned char*)malloc(size);

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy (pixel, rowBase, size);

Please also note that your code will miserably fail on an iPhone 3G.

Tafkadasoh · Answer

If you only need the luminance channel, I recommend against using BGRA format, as it comes with a conversion overhead. Apple suggest using BGRA if you're doing rendering stuff, but you don't need it for extracting the luminance information. As Brad already mentioned, the most efficient format is the camera-native YUV format.

However, extracting the right bytes from the sample buffer is a bit tricky, especially regarding the iPhone 3G with it's interleaved YUV 422 format. So here is my code, which works fine with the iPhone 3G, 3GS, iPod Touch 4 and iPhone 4S.

#pragma mark -
#pragma mark AVCaptureVideoDataOutputSampleBufferDelegate Methods
#if !(TARGET_IPHONE_SIMULATOR)
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection;
{
    // get image buffer reference
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);

    // extract needed informations from image buffer
    CVPixelBufferLockBaseAddress(imageBuffer, 0);
    size_t bufferSize = CVPixelBufferGetDataSize(imageBuffer);
    void *baseAddress = CVPixelBufferGetBaseAddress(imageBuffer);
    CGSize resolution = CGSizeMake(CVPixelBufferGetWidth(imageBuffer), CVPixelBufferGetHeight(imageBuffer));

    // variables for grayscaleBuffer 
    void *grayscaleBuffer = 0;
    size_t grayscaleBufferSize = 0;

    // the pixelFormat differs between iPhone 3G and later models
    OSType pixelFormat = CVPixelBufferGetPixelFormatType(imageBuffer);

    if (pixelFormat == '2vuy') { // iPhone 3G
        // kCVPixelFormatType_422YpCbCr8     = '2vuy',    
        /* Component Y'CbCr 8-bit 4:2:2, ordered Cb Y'0 Cr Y'1 */

        // copy every second byte (luminance bytes form Y-channel) to new buffer
        grayscaleBufferSize = bufferSize/2;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        void *sourceMemPos = baseAddress + 1;
        void *destinationMemPos = grayscaleBuffer;
        void *destinationEnd = grayscaleBuffer + grayscaleBufferSize;
        while (destinationMemPos <= destinationEnd) {
            memcpy(destinationMemPos, sourceMemPos, 1);
            destinationMemPos += 1;
            sourceMemPos += 2;
        }       
    }

    if (pixelFormat == '420v' || pixelFormat == '420f') {
        // kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v', 
        // kCVPixelFormatType_420YpCbCr8BiPlanarFullRange  = '420f',
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, video-range (luma=[16,235] chroma=[16,240]).  
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, full-range (luma=[0,255] chroma=[1,255]).
        // baseAddress points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct
        // i.e.: Y-channel in this format is in the first third of the buffer!
        int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0);
        baseAddress = CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0);
        grayscaleBufferSize = resolution.height * bytesPerRow ;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        memcpy (grayscaleBuffer, baseAddress, grayscaleBufferSize); 
    }

    // do whatever you want with the grayscale buffer
    ...

    // clean-up
    free(grayscaleBuffer);
}
#endif

Awesomeness · Answer

This is simply the culmination of everyone else's hard work, above and on other threads, converted to swift 3 for anyone that finds it useful.

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
        CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)

        let pixelFormatType = CVPixelBufferGetPixelFormatType(pixelBuffer)
        if pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
           || pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange {

            let bufferHeight = CVPixelBufferGetHeight(pixelBuffer)
            let bufferWidth = CVPixelBufferGetWidth(pixelBuffer)

            let lumaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
            let size = bufferHeight * lumaBytesPerRow
            let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
            let lumaByteBuffer = unsafeBitCast(lumaBaseAddress, to:UnsafeMutablePointer<UInt8>.self)

            let releaseDataCallback: CGDataProviderReleaseDataCallback = { (info: UnsafeMutableRawPointer?, data: UnsafeRawPointer, size: Int) -> () in
                // https://developer.apple.com/reference/coregraphics/cgdataproviderreleasedatacallback
                // N.B. 'CGDataProviderRelease' is unavailable: Core Foundation objects are automatically memory managed
                return
            }

            if let dataProvider = CGDataProvider(dataInfo: nil, data: lumaByteBuffer, size: size, releaseData: releaseDataCallback) {
                let colorSpace = CGColorSpaceCreateDeviceGray()
                let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipFirst.rawValue)

                let cgImage = CGImage(width: bufferWidth, height: bufferHeight, bitsPerComponent: 8, bitsPerPixel: 8, bytesPerRow: lumaBytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo, provider: dataProvider, decode: nil, shouldInterpolate: false, intent: CGColorRenderingIntent.defaultIntent)

                let greyscaleImage = UIImage(cgImage: cgImage!)
                // do what you want with the greyscale image.
            }
        }

        CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)
    }
}

How to get the Y component from CMSampleBuffer resulted from the AVCaptureSession?

Tags:

stream

iphone

avcapturesession

Nihao

4 Answers

Brad Larson

Codo

Tafkadasoh

Awesomeness

Recent Activity

Donate For Us

How to get the Y component from CMSampleBuffer resulted from the AVCaptureSession?

Tags:

stream

iphone

avcapturesession

Nihao

4 Answers

Brad Larson

Codo

Tafkadasoh

Awesomeness

Related questions

Recent Activity

Donate For Us