Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ARKit with multiple users

What is the best way, if any, to use Apple's new ARKit with multiple users/devices?

It seems that each devices gets its own scene understanding individually. My best guess so far is to use raw features points positions and try to match them across devices to glue together the different points of views since ARKit doesn't offer any absolute referential reference.

===Edit1, Things I've tried===

1) Feature points

I've played around and with the exposed raw features points and I'm now convinced that in their current state they are a dead end:

  • they are not raw feature points, they only expose positions but none of the attributes typically found in tracked feature points
  • their instantiation doesn't carry over from frame to frame, nor are the positions exactly the same
  • it often happens that reported feature points change by a lot when the camera input is almost not changing, with either a lot appearing or disappearing.

So overall I think it's unreasonable to try to use them in some meaningful way, not being able to make any kind of good point matching within one device, let alone several. Alternative would to implement my own feature point detection and matching, but that'd be more replacing ARKit than leveraging it.

2) QR code

As @Rickster suggested, I've also tried identifying an easily identifiable object like a QR code and getting the relative referential change from that fixed point (see this question) It's a bit difficult and implied me using some openCV to estimate camera pose. But more importantly very limiting

like image 412
Guig Avatar asked Sep 09 '25 21:09

Guig


1 Answers

As some newer answers have added, multiuser AR is a headline feature of ARKit 2 (aka ARKit on iOS 12). The WWDC18 talk on ARKit 2 has a nice overview, and Apple has two developer sample code projects to help you get started: a basic example that just gets 2+ devices into a shared experience, and SwiftShot, a real multiplayer game built for AR.

The major points:

  1. ARWorldMap wraps up everything ARKit knows about the local environment into a serializable object, so you can save it for later or send it to another device. In the latter case, "relocalizing" to a world map saved by another device in the same local environment gives both devices the same frame of reference (world coordinate system).

  2. Use the networking technology of your choice to send the ARWorldMap between devices: AirDrop, cloud shares, carrier pigeon, etc all work, but Apple's Multipeer Connectivity framework is one good, easy, and secure option, so it's what Apple uses in their example projects.

  3. All of this gives you only the basis for creating a shared experience — multiple copies on your app on multiple devices all using a world coordinate system that lines up with the same real-world environment. That's all you need to get multiple users experiencing the same static AR content, but if you want them to interact in AR, you'll need to use your favorite networking technology some more.

    Apple's basic multiuser AR demo shows encoding an ARAnchor and sending it to peers, so that one user can tap to place a 3D model in the world and all others can see it. The SwiftShot game example builds a whole networking protocol so that all users get the same gameplay actions (like firing slingshots at each other) and synchronized physics results (like blocks falling down after being struck). Both use Multipeer Connectivity.

(BTW, the second and third points above are where you get the "2 to 6" figure from @andy's answer — there's no limit on the ARKit side, because ARKit has no idea how many people may have received the world map you saved. However, Multipeer Connectivity has an 8 peer limit. And whatever game / app / experience you build on top of this may have latency / performance scaling issues as you add more peers, but that depends on your technology and design.)

Original answer below for historical interest...


This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I'm not sure there's a "best way" yet, if even a feasible way at all.

Feature points are positioned relative to the session, and aren't individually identified, so I'd imagine correlating them between multiple users would be tricky.

The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision... Well, then you'd not only have a reference frame that could be shared by multiple users, you'd also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)

Another avenue I've heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers' positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)

Good luck!

like image 104
rickster Avatar answered Sep 13 '25 02:09

rickster