I am attempting to find a simple way in SceneKit to calculate the depth of a pixels in SceneKit and LiDAR data from
sceneView.session.currentFrame?.smoothedSceneDepth?.depthMap
Ideally I don't want to use metal shaders. I would prefer find a points in my currentFrame
and their corresponding depth map, to get the depth of a points in SceneKit (ideally in world coordinates, not just local to that frustum at that point in time).
Fast performance isn't necessary as it won't be calculated at capture.
I am aware of the Apple project at link, however this is far too complex for my needs.
As a starting point, my code works like this:
guard let depthData = frame.sceneDepth else { return }
let camera = frame.camera
let depthPixelBuffer = depthData.depthMap
let depthHeight = CVPixelBufferGetHeight(depthPixelBuffer)
let depthWidth = CVPixelBufferGetWidth(depthPixelBuffer)
let resizeScale = CGFloat(depthWidth) / CGFloat(CVPixelBufferGetWidth(frame.capturedImage))
let resizedColorImage = frame.capturedImage.toCGImage(scale: resizeScale);
guard let colorData = resizedColorImage.pixelData() else {
fatalError()
}
var intrinsics = camera.intrinsics;
let referenceDimensions = camera.imageResolution;
let ratio = Float(referenceDimensions.width) / Float(depthWidth)
intrinsics.columns.0[0] /= ratio
intrinsics.columns.1[1] /= ratio
intrinsics.columns.2[0] /= ratio
intrinsics.columns.2[1] /= ratio
var points: [SCNVector3] = []
let depthValues = depthPixelBuffer.depthValues()
for vv in 0..<depthHeight {
for uu in 0..<depthWidth {
let z = -depthValues[uu + vv * depthWidth]
let x = Float32(uu) / Float32(depthWidth) * 2.0 - 1.0;
let y = 1.0 - Float32(vv) / Float32(depthHeight) * 2.0;
points.append(SCNVector3(x, y, z))
}
}
The resulting point cloud looks ok, but is severely bent on the Z-axis. I realize this code is also not adjusting for screen orientation either.
Cupertino kindly got back to me with this response on the forums at developer.apple.com:
The unprojection calculation itself is going to be identical, regardless of whether it is done CPU side or GPU side.
CPU side, the calculation would look something like this:
/// Returns a world space position given a point in the camera image, the eye space depth (sampled/read from the corresponding point in the depth image), the inverse camera intrinsics, and the inverse view matrix.
func worldPoint(cameraPoint: SIMD2<Float>, eyeDepth: Float, cameraIntrinsicsInversed: simd_float3x3, viewMatrixInversed: simd_float4x4) -> SIMD3<Float> {
let localPoint = cameraIntrinsicsInversed * simd_float3(cameraPoint, 1) * -eyeDepth
let worldPoint = viewMatrixInversed * simd_float4(localPoint, 1);
return (worldPoint / worldPoint.w)[SIMD3(0,1,2)];
}
Implemented, this looks like
for vv in 0..<depthHeight {
for uu in 0..<depthWidth {
let z = -depthValues[uu + vv * depthWidth]
let viewMatInverted = (sceneView.session.currentFrame?.camera.viewMatrix(for: UIApplication.shared.statusBarOrientation))!.inverse
let worldPoint = worldPoint(cameraPoint: SIMD2(Float(uu), Float(vv)), eyeDepth: z, cameraIntrinsicsInversed: intrinsics.inverse, viewMatrixInversed: viewMatInverted * rotateToARCamera )
points.append(SCNVector3(worldPoint))
}
}
The point cloud is pretty messy, needs confidence worked out, and there are gaps vertically where Int rounding has occurred, but it's a solid start. Missing functions come from the link to the Apple demo project in the question above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With