I'm working on an app that uses the video feed from the DJI Mavic 2 and runs it through a machine learning model to identify objects.
I managed to get my app to preview the feed from the drone using this sample DJI project, but I'm having a lot of trouble trying to get the video data into a format that's usable by the Vision
framework.
I used this example from Apple as a guide to create my model (which is working!) but it looks I need to create a VNImageRequestHandler
object which is created with a cvPixelBuffer
of type CMSampleBuffer
in order to use Vision
.
Any idea how to make this conversion? Is there a better way to do this?
class DJICameraViewController: UIViewController, DJIVideoFeedListener, DJISDKManagerDelegate, DJICameraDelegate, VideoFrameProcessor {
// ...
func videoFeed(_ videoFeed: DJIVideoFeed, didUpdateVideoData rawData: Data) {
let videoData = rawData as NSData
let videoBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: videoData.length)
videoData.getBytes(videoBuffer, length: videoData.length)
DJIVideoPreviewer.instance().push(videoBuffer, length: Int32(videoData.length))
}
// MARK: VideoFrameProcessor Protocol Implementation
func videoProcessorEnabled() -> Bool {
// This is never called
return true
}
func videoProcessFrame(_ frame: UnsafeMutablePointer<VideoFrameYUV>!) {
// This is never called
let pixelBuffer = frame.pointee.cv_pixelbuffer_fastupload as! CVPixelBuffer
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientationFromDeviceOrientation(), options: [:])
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
} // End of DJICameraViewController class
EDIT: from what I've gathered from DJI's (spotty) documentation, it looks like the video feed is compressed H264. They claim the DJIWidget
includes helper methods for decompression, but I haven't had success in understanding how to use them correctly because there is no documentation surrounding its use.
EDIT 2: Here's the issue I created on GitHub for the DJIWidget framework
EDIT 3: Updated code snippet with additional methods for VideoFrameProcessor
, removing old code from videoFeed
method
EDIT 4: Details about how to extract the pixel buffer successfully and utilize it can be found in this comment from GitHub
The steps :
Call DJIVideoPreviewer
’s push:length:
method and input the rawData
. Inside DJIVideoPreviewer
, if you have used VideoPreviewerSDKAdapter
please skip this. (H.264 parsing and decoding steps will be performed once you do this.)
Conform to the VideoFrameProcessor
protocol and call DJIVideoPreviewer.registFrameProcessor
to register the VideoFrameProcessor
protocol object.
VideoFrameProcessor
protocol’s videoProcessFrame:
method will output the VideoFrameYUV
data.
Get the CVPixelBuffer
data. VideoFrameYUV
struct has a cv_pixelbuffer_fastupload
field, this data is actually of type CVPixelBuffer
when hardware decoding is turned on. If you are using software decoding, you will need to create a CVPixelBuffer
yourself and copy the data from the VideoFrameYUV
's luma
, chromaB
and chromaR
field.
Code:
VideoFrameYUV* yuvFrame; // the VideoFrameProcessor output
CVPixelBufferRef pixelBuffer = NULL;
CVReturn resulst = CVPixelBufferCreate(kCFAllocatorDefault,
yuvFrame-> width,
yuvFrame -> height,
kCVPixelFormatType_420YpCbCr8Planar,
NULL,
&pixelBuffer);
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(pixelBuffer, 0) || pixelBuffer == NULL) {
return;
}
long yPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
long yPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer,0);
long uPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
long uPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
long vPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 2);
long vPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 2);
uint8_t* yDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy(yDestination, yuvFrame->luma, yPlaneWidth * yPlaneHeight);
uint8_t* uDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
memcpy(uDestination, yuvFrame->chromaB, uPlaneWidth * uPlaneHeight);
uint8_t* vDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 2);
memcpy(vDestination, yuvFrame->chromaR, vPlaneWidth * vPlaneHeight);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With