Texture ARMeshGeometry from ARKit Camera frame?

This question somewhat builds on this post, wherein the idea is to take the ARMeshGeometry from an iOS device with LiDAR scanner, calculate the texture coordinates, and apply the sampled camera frame as the texture for a given mesh, hereby allowing a user to create a "photorealistic" 3D representation of their environment.

Per that post, I have adapted one of the responses to calculate the texture coordinates, like so;

func buildGeometry(meshAnchor: ARMeshAnchor, arFrame: ARFrame) -> SCNGeometry {
    let vertices = meshAnchor.geometry.vertices

    let faces = meshAnchor.geometry.faces
    let camera = arFrame.camera
    let size = arFrame.camera.imageResolution
    // use the MTL buffer that ARKit gives us
    let vertexSource = SCNGeometrySource(buffer: vertices.buffer, vertexFormat: vertices.format, semantic: .vertex, vertexCount: vertices.count, dataOffset: vertices.offset, dataStride: vertices.stride)
    // set the camera matrix
    let modelMatrix = meshAnchor.transform
    var textCords = [CGPoint]()
    for index in 0..<vertices.count {
        let vertexPointer = vertices.buffer.contents().advanced(by: vertices.offset + vertices.stride * index)
        let vertex = vertexPointer.assumingMemoryBound(to: (Float, Float, Float).self).pointee
        let vertex4 = SIMD4<Float>(vertex.0, vertex.1, vertex.2, 1)
        let world_vertex4 = simd_mul(modelMatrix, vertex4)
        let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
        let pt = camera.projectPoint(world_vector3, orientation: .portrait, viewportSize: CGSize(width: CGFloat(size.height), height: CGFloat(size.width)))
        let v = 1.0 - Float(pt.x) / Float(size.height)
        let u = Float(pt.y) / Float(size.width)
        //let z = vector_float2(u, v)
        let c = CGPoint(x: v, y: u)
    // Setup the texture coordinates
    let textureSource = SCNGeometrySource(textureCoordinates: textCords)
    // Setup the normals
    let normalsSource = SCNGeometrySource(meshAnchor.geometry.normals, semantic: .normal)
    // Setup the geometry
    let faceData = Data(bytesNoCopy: faces.buffer.contents(), count: faces.buffer.length, deallocator: .none)
    let geometryElement = SCNGeometryElement(data: faceData, primitiveType: .triangles, primitiveCount: faces.count, bytesPerIndex: faces.bytesPerIndex)
    let nodeGeometry = SCNGeometry(sources: [vertexSource, textureSource, normalsSource], elements: [geometryElement])
    /* Setup texture - THIS IS WHERE I AM STUCK
    let texture = textureConverter.makeTextureForMeshModel(frame: arFrame)
    let imageMaterial = SCNMaterial()
    imageMaterial.isDoubleSided = false
    imageMaterial.diffuse.contents = texture!
    nodeGeometry.materials = [imageMaterial]
    return nodeGeometry

Where I am struggling is to determine if these texture coordinates are actually calculating properly, and subsequently, how I would sample the camera frame to apply the relevant frame image as the texture for that mesh.

The linked question indicated that converting the ARFrame's capturedImage (which is a CVPixelBuffer) property to a MTLTexture would be ideal for real-time performance, but it has become apparent to me that the CVPixelBuffer is a YCbCr image, whereas I believe I would need a RGB image.

In my textureConverter class, I am attempting to convert the CVPixelBuffer to a MTLTexture, but am unsure how to return a RGB MTLTexture;

func makeTextureForMeshModel(frame: ARFrame) -> MTLTexture? {
    if CVPixelBufferGetPlaneCount(frame.capturedImage) < 2 {
        return nil
    let cameraImageTextureY = createTexture(fromPixelBuffer: frame.capturedImage, pixelFormat: .r8Unorm, planeIndex: 0)
    let cameraImageTextureCbCr = createTexture(fromPixelBuffer: frame.capturedImage, pixelFormat: .rg8Unorm, planeIndex: 1)
    /* How do I blend the Y and CbCr textures, or return a RGB texture, to return a single MTLTexture?
    return ...

func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
    let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
    let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
    var texture: CVMetalTexture? = nil
    let status = CVMetalTextureCacheCreateTextureFromImage(nil, textureCache, pixelBuffer, nil, pixelFormat,
                                                           width, height, planeIndex, &texture)
    if status != kCVReturnSuccess {
        texture = nil
    return texture

In the end, I'm not entirely sure if I really need a RGB texture vs. a YCbCr texture, but I am still unsure how I would return the proper image for texturing (my attempts to return just the CVPixelBuffer without worrying about the YCbCr color space, by manually setting a texture format, results in a very bizarre looking image).

You can check out my repository here: MetalWorldTextureScan

The project demonstrates how to:

  • Render a mesh with Metal while scanning
  • Crop your scan to a bounding box
  • Save camera frames for texturing
  • Create and texture an SCNGeometry from your scan

Calculating texture coordinates:

frame: saved frame to use for the texture

vert: the vertex you want to project into the frame

aTrans: the transform for the 'chunk' of mesh the vertex is a part of

func getTextureCoord(frame: ARFrame, vert: SIMD3<Float>, aTrans: simd_float4x4) -> vector_float2 {
    // convert vertex to world coordinates
    let cam = frame.camera
    let size = cam.imageResolution
    let vertex4 = vector_float4(vert.x, vert.y, vert.z, 1)
    let world_vertex4 = simd_mul(aTrans, vertex4)
    let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
    // project the point into the camera image to get u,v
    let pt = cam.projectPoint(world_vector3,
        orientation: .portrait,
        viewportSize: CGSize(
            width: CGFloat(size.height),
            height: CGFloat(size.width)))
    let v = 1.0 - Float(pt.x) / Float(size.height)
    let u = Float(pt.y) / Float(size.width)
    let tCoord = vector_float2(u, v)
    return tCoord

Saving a frame for texturing:

A struct called 'TextureFrame' is used to hold the position, ARFrame, and potentially other useful info.

struct TextureFrame {
    var key: String       // date/time/anything
    var dist: CGFloat     // dist from bBox
    var frame: ARFrame    // saved frame
    var pos: SCNVector3   // location in reference to bBox

How it's used:

func saveTextureFrame() {
    guard let frame = session.currentFrame else {
        print("can't get current frame")
    let camTrans = frame.camera.transform
    let camPos = SCNVector3(camTrans.columns.3.x, camTrans.columns.3.y, camTrans.columns.3.z)
    let cam2BoxLocal = SCNVector3(camPos.x - bBoxOrigin.x, camPos.y - bBoxOrigin.y, camPos.z - bBoxOrigin.z)
    let dist = dist3D(a: camPos, b: bBoxOrigin)
    let dateFormatter = DateFormatter()
    dateFormatter.dateFormat = "yyyy:MM:dd:HH:mm:ss:SS"
    dateFormatter.timeZone = TimeZone(abbreviation: "CDT")
    let date = Date()
    let dString = dateFormatter.string(from: date)
    let textFrame = TextureFrame(key: dString, dist: dist, frame: frame, pos: cam2BoxLocal)
    delegate.didSaveFrame(renderer: self)

Making the textured mesh:

This happens in the makeTexturedMesh() function. We iterate through all the mesh chunks, and iterate through each face (triangle) of each chunk, where the texture coordinates are calculated, and a single SCNGeometry is created for the triangle and added to the scene. A recombineGeometries() function is also defined for, well, recombining the triangles.

func makeTexturedMesh() {
    let worldMeshes = renderer.worldMeshes
    let textureCloud = renderer.textureCloud
    print("texture images: \(textureImgs.count)")
    // each 'mesh' is a chunk of the whole scan
    for mesh in worldMeshes {
        let aTrans = SCNMatrix4(mesh.transform)
        let vertices: ARGeometrySource = mesh.vertices
        let normals: ARGeometrySource = mesh.normals
        let faces: ARGeometryElement = mesh.submesh
        var texture: UIImage!
        // a face is just a list of three indices, each representing a vertex
        for f in 0..<faces.count {
            // check to see if each vertex of the face is inside of our box
            var c = 0
            let face = face(at: f, faces: faces)
            for fv in face {
                // this is set by the renderer
                if mesh.inBox[fv] == 1 {
                    c += 1
            guard c == 3 else {continue}
            // all verts of the face are in the box, so the triangle is visible
            var fVerts: [SCNVector3] = []
            var fNorms: [SCNVector3] = []
            var tCoords: [vector_float2] = []
            // convert each vertex and normal to world coordinates
            // get the texture coordinates
            for fv in face {
                let vert = vertex(at: UInt32(fv), vertices: vertices)
                let vTrans = SCNMatrix4MakeTranslation(vert[0], vert[1], vert[2])
                let wTrans = SCNMatrix4Mult(vTrans, aTrans)
                let wPos = SCNVector3(wTrans.m41, wTrans.m42, wTrans.m43)
                let norm = normal(at: UInt32(fv), normals: normals)
                let nTrans = SCNMatrix4MakeTranslation(norm[0], norm[1], norm[2])
                let wNTrans = SCNMatrix4Mult(nTrans, aTrans)
                let wNPos = SCNVector3(wNTrans.m41, wTrans.m42, wNTrans.m43)
                // here's where you would find the frame that best fits
                // for simplicity, just use the last frame here
                let tFrame = textureCloud.last!.frame
                let tCoord = getTextureCoord(frame: tFrame, vert: vert, aTrans: mesh.transform)
                texture = textureImgs[textureCloud.count - 1]
                // visualize the normals if you want
                if mesh.inBox[fv] == 1 {
                    //let normVis = lineBetweenNodes(positionA: wPos, positionB: wNPos, inScene: arView.scene)
            // make a single triangle mesh out each face
            let vertsSource = SCNGeometrySource(vertices: fVerts)
            let normsSource = SCNGeometrySource(normals: fNorms)
            let facesSource = SCNGeometryElement(indices: [UInt32(0), UInt32(1), UInt32(2)], primitiveType: .triangles)
            let textrSource = SCNGeometrySource(textureCoordinates: tCoords)
            let geom = SCNGeometry(sources: [vertsSource, normsSource, textrSource], elements: [facesSource])
            // texture it with a saved camera frame
            let mat = SCNMaterial()
            mat.diffuse.contents = texture
            mat.isDoubleSided = false
            geom.materials = [mat]
            let meshNode = SCNNode(geometry: geom)
            DispatchQueue.main.async {

The project also includes methods for saving and loading meshes from the documents directory.

This is by no means the best way to go about meshing and texturing a 3D scan, but it's a good demonstration on how to get started with built in iOS frameworks.

