This question somewhat builds on this post, wherein the idea is to take the ARMeshGeometry
from an iOS device with LiDAR scanner, calculate the texture coordinates, and apply the sampled camera frame as the texture for a given mesh, hereby allowing a user to create a "photorealistic" 3D representation of their environment.
Per that post, I have adapted one of the responses to calculate the texture coordinates, like so;
func buildGeometry(meshAnchor: ARMeshAnchor, arFrame: ARFrame) -> SCNGeometry {
let vertices = meshAnchor.geometry.vertices
let faces = meshAnchor.geometry.faces
let camera = arFrame.camera
let size = arFrame.camera.imageResolution
// use the MTL buffer that ARKit gives us
let vertexSource = SCNGeometrySource(buffer: vertices.buffer, vertexFormat: vertices.format, semantic: .vertex, vertexCount: vertices.count, dataOffset: vertices.offset, dataStride: vertices.stride)
// set the camera matrix
let modelMatrix = meshAnchor.transform
var textCords = [CGPoint]()
for index in 0..<vertices.count {
let vertexPointer = vertices.buffer.contents().advanced(by: vertices.offset + vertices.stride * index)
let vertex = vertexPointer.assumingMemoryBound(to: (Float, Float, Float).self).pointee
let vertex4 = SIMD4<Float>(vertex.0, vertex.1, vertex.2, 1)
let world_vertex4 = simd_mul(modelMatrix, vertex4)
let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
let pt = camera.projectPoint(world_vector3, orientation: .portrait, viewportSize: CGSize(width: CGFloat(size.height), height: CGFloat(size.width)))
let v = 1.0 - Float(pt.x) / Float(size.height)
let u = Float(pt.y) / Float(size.width)
//let z = vector_float2(u, v)
let c = CGPoint(x: v, y: u)
textCords.append(c)
}
// Setup the texture coordinates
let textureSource = SCNGeometrySource(textureCoordinates: textCords)
// Setup the normals
let normalsSource = SCNGeometrySource(meshAnchor.geometry.normals, semantic: .normal)
// Setup the geometry
let faceData = Data(bytesNoCopy: faces.buffer.contents(), count: faces.buffer.length, deallocator: .none)
let geometryElement = SCNGeometryElement(data: faceData, primitiveType: .triangles, primitiveCount: faces.count, bytesPerIndex: faces.bytesPerIndex)
let nodeGeometry = SCNGeometry(sources: [vertexSource, textureSource, normalsSource], elements: [geometryElement])
/* Setup texture - THIS IS WHERE I AM STUCK
let texture = textureConverter.makeTextureForMeshModel(frame: arFrame)
*/
let imageMaterial = SCNMaterial()
imageMaterial.isDoubleSided = false
imageMaterial.diffuse.contents = texture!
nodeGeometry.materials = [imageMaterial]
return nodeGeometry
}
Where I am struggling is to determine if these texture coordinates are actually calculating properly, and subsequently, how I would sample the camera frame to apply the relevant frame image as the texture for that mesh.
The linked question indicated that converting the ARFrame
's capturedImage
(which is a CVPixelBuffer
) property to a MTLTexture
would be ideal for real-time performance, but it has become apparent to me that the CVPixelBuffer
is a YCbCr
image, whereas I believe I would need a RGB
image.
In my textureConverter
class, I am attempting to convert the CVPixelBuffer
to a MTLTexture
, but am unsure how to return a RGB
MTLTexture
;
func makeTextureForMeshModel(frame: ARFrame) -> MTLTexture? {
if CVPixelBufferGetPlaneCount(frame.capturedImage) < 2 {
return nil
}
let cameraImageTextureY = createTexture(fromPixelBuffer: frame.capturedImage, pixelFormat: .r8Unorm, planeIndex: 0)
let cameraImageTextureCbCr = createTexture(fromPixelBuffer: frame.capturedImage, pixelFormat: .rg8Unorm, planeIndex: 1)
/* How do I blend the Y and CbCr textures, or return a RGB texture, to return a single MTLTexture?
return ...
}
func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, textureCache, pixelBuffer, nil, pixelFormat,
width, height, planeIndex, &texture)
if status != kCVReturnSuccess {
texture = nil
}
return texture
}
In the end, I'm not entirely sure if I really need a RGB
texture vs. a YCbCr
texture, but I am still unsure how I would return the proper image for texturing (my attempts to return just the CVPixelBuffer
without worrying about the YCbCr
color space, by manually setting a texture format, results in a very bizarre looking image).
You can check out my repository here: MetalWorldTextureScan
The project demonstrates how to:
frame: saved frame to use for the texture
vert: the vertex you want to project into the frame
aTrans: the transform for the 'chunk' of mesh the vertex is a part of
func getTextureCoord(frame: ARFrame, vert: SIMD3<Float>, aTrans: simd_float4x4) -> vector_float2 {
// convert vertex to world coordinates
let cam = frame.camera
let size = cam.imageResolution
let vertex4 = vector_float4(vert.x, vert.y, vert.z, 1)
let world_vertex4 = simd_mul(aTrans, vertex4)
let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
// project the point into the camera image to get u,v
let pt = cam.projectPoint(world_vector3,
orientation: .portrait,
viewportSize: CGSize(
width: CGFloat(size.height),
height: CGFloat(size.width)))
let v = 1.0 - Float(pt.x) / Float(size.height)
let u = Float(pt.y) / Float(size.width)
let tCoord = vector_float2(u, v)
return tCoord
}
A struct called 'TextureFrame' is used to hold the position, ARFrame, and potentially other useful info.
struct TextureFrame {
var key: String // date/time/anything
var dist: CGFloat // dist from bBox
var frame: ARFrame // saved frame
var pos: SCNVector3 // location in reference to bBox
}
How it's used:
func saveTextureFrame() {
guard let frame = session.currentFrame else {
print("can't get current frame")
return
}
let camTrans = frame.camera.transform
let camPos = SCNVector3(camTrans.columns.3.x, camTrans.columns.3.y, camTrans.columns.3.z)
let cam2BoxLocal = SCNVector3(camPos.x - bBoxOrigin.x, camPos.y - bBoxOrigin.y, camPos.z - bBoxOrigin.z)
let dist = dist3D(a: camPos, b: bBoxOrigin)
let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "yyyy:MM:dd:HH:mm:ss:SS"
dateFormatter.timeZone = TimeZone(abbreviation: "CDT")
let date = Date()
let dString = dateFormatter.string(from: date)
let textFrame = TextureFrame(key: dString, dist: dist, frame: frame, pos: cam2BoxLocal)
textureCloud.append(textFrame)
delegate.didSaveFrame(renderer: self)
}
This happens in the makeTexturedMesh() function. We iterate through all the mesh chunks, and iterate through each face (triangle) of each chunk, where the texture coordinates are calculated, and a single SCNGeometry is created for the triangle and added to the scene. A recombineGeometries() function is also defined for, well, recombining the triangles.
func makeTexturedMesh() {
let worldMeshes = renderer.worldMeshes
let textureCloud = renderer.textureCloud
print("texture images: \(textureImgs.count)")
// each 'mesh' is a chunk of the whole scan
for mesh in worldMeshes {
let aTrans = SCNMatrix4(mesh.transform)
let vertices: ARGeometrySource = mesh.vertices
let normals: ARGeometrySource = mesh.normals
let faces: ARGeometryElement = mesh.submesh
var texture: UIImage!
// a face is just a list of three indices, each representing a vertex
for f in 0..<faces.count {
// check to see if each vertex of the face is inside of our box
var c = 0
let face = face(at: f, faces: faces)
for fv in face {
// this is set by the renderer
if mesh.inBox[fv] == 1 {
c += 1
}
}
guard c == 3 else {continue}
// all verts of the face are in the box, so the triangle is visible
var fVerts: [SCNVector3] = []
var fNorms: [SCNVector3] = []
var tCoords: [vector_float2] = []
// convert each vertex and normal to world coordinates
// get the texture coordinates
for fv in face {
let vert = vertex(at: UInt32(fv), vertices: vertices)
let vTrans = SCNMatrix4MakeTranslation(vert[0], vert[1], vert[2])
let wTrans = SCNMatrix4Mult(vTrans, aTrans)
let wPos = SCNVector3(wTrans.m41, wTrans.m42, wTrans.m43)
fVerts.append(wPos)
let norm = normal(at: UInt32(fv), normals: normals)
let nTrans = SCNMatrix4MakeTranslation(norm[0], norm[1], norm[2])
let wNTrans = SCNMatrix4Mult(nTrans, aTrans)
let wNPos = SCNVector3(wNTrans.m41, wTrans.m42, wNTrans.m43)
fNorms.append(wNPos)
// here's where you would find the frame that best fits
// for simplicity, just use the last frame here
let tFrame = textureCloud.last!.frame
let tCoord = getTextureCoord(frame: tFrame, vert: vert, aTrans: mesh.transform)
tCoords.append(tCoord)
texture = textureImgs[textureCloud.count - 1]
// visualize the normals if you want
if mesh.inBox[fv] == 1 {
//let normVis = lineBetweenNodes(positionA: wPos, positionB: wNPos, inScene: arView.scene)
//arView.scene.rootNode.addChildNode(normVis)
}
}
allVerts.append(fVerts)
allNorms.append(fNorms)
allTCrds.append(tCoords)
// make a single triangle mesh out each face
let vertsSource = SCNGeometrySource(vertices: fVerts)
let normsSource = SCNGeometrySource(normals: fNorms)
let facesSource = SCNGeometryElement(indices: [UInt32(0), UInt32(1), UInt32(2)], primitiveType: .triangles)
let textrSource = SCNGeometrySource(textureCoordinates: tCoords)
let geom = SCNGeometry(sources: [vertsSource, normsSource, textrSource], elements: [facesSource])
// texture it with a saved camera frame
let mat = SCNMaterial()
mat.diffuse.contents = texture
mat.isDoubleSided = false
geom.materials = [mat]
let meshNode = SCNNode(geometry: geom)
DispatchQueue.main.async {
self.scanNode.addChildNode(meshNode)
}
}
}
}
The project also includes methods for saving and loading meshes from the documents directory.
This is by no means the best way to go about meshing and texturing a 3D scan, but it's a good demonstration on how to get started with built in iOS frameworks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With