Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I take a photo of a detected rectangle in Apple Vision framework

How can I take a photo (get an CIImage) from the successful VNRectangleObservation object?

I have a video capture session running and in func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) I do the handling, namely

func captureOutput(_ output: AVCaptureOutput,
                   didOutput sampleBuffer: CMSampleBuffer,
                   from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    do {
        try handler.perform([request], on: pixelBuffer)
    } catch {
        print(error)
    }
}

Should I save somewhere the pixel-buffer that I pass to the handler and manipulate on that buffer? It's a damn shame that I can not access the image as a property from the observation object :(

Any ideas?

like image 902
denis631 Avatar asked Jan 09 '18 14:01

denis631


1 Answers

So you're using a Vision request that produces VNRectangleObservations, and you want to pull out the regions of the subject image identified by those observations? Maybe perspective-project them, too, so that they're rectangular in the image plane? (There's a demo of this in the Vision session from WWDC17.)

You can extract and rectify the region with the CIPerspectiveCorrection filter from Core Image. To set that up, you'll need to pass the points from the image observation, converted to pixel coordinates. That looks something like this:

func extractPerspectiveRect(_ observation: VNRectangleObservation, from buffer: CVImageBuffer) -> CIImage {
    // get the pixel buffer into Core Image
    let ciImage = CIImage(cvImageBuffer: buffer)

    // convert corners from normalized image coordinates to pixel coordinates
    let topLeft = observation.topLeft.scaled(to: ciImage.extent.size)
    let topRight = observation.topRight.scaled(to: ciImage.extent.size)
    let bottomLeft = observation.bottomLeft.scaled(to: ciImage.extent.size)
    let bottomRight = observation.bottomRight.scaled(to: ciImage.extent.size)

    // pass those to the filter to extract/rectify the image
    return ciImage.applyingFilter("CIPerspectiveCorrection", parameters: [
        "inputTopLeft": CIVector(cgPoint: topLeft),
        "inputTopRight": CIVector(cgPoint: topRight),
        "inputBottomLeft": CIVector(cgPoint: bottomLeft),
        "inputBottomRight": CIVector(cgPoint: bottomRight),
    ])
}

Aside: The scaled function above is a convenience extension on CGPoint to make coordinate math a bit smaller at the call site:

extension CGPoint {
   func scaled(to size: CGSize) -> CGPoint {
       return CGPoint(x: self.x * size.width,
                      y: self.y * size.height)
   }
}

Now, that gets you a CIImage object — those aren't really displayable images themselves, just instructions for how to process and display an image, something that can be done in many different possible ways. Many ways to display an image involve CIContext — you can have it render out into another pixel buffer, or maybe a Metal texture if you're trying to do this processing in real-time — but not all. On the other hand, if you're just displaying static images less frequently, you can create a UIImage directly from the CIImage and display it in a UIImageView, and UIKit will manage the underlying CIContext and rendering process.

like image 137
rickster Avatar answered Oct 24 '22 10:10

rickster