Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Future prospects for improvement of depth data on Project Tango tablet

I am interested in using the Project Tango tablet for 3D reconstruction using arbitrary point features. In the current SDK version, we seem to have access to the following data.

  • A 1280 x 720 RGB image.
  • A point cloud with 0-~10,000 points, depending on the environment. This seems to average between 3,000 and 6,000 in most environments.

What I really want is to be able to identify a 3D point for key points within an image. Therefore, it makes sense to project depth into the image plane. I have done this, and I get something like this:

enter image description here

The problem with this process is that the depth points are sparse compared to the RGB pixels. So I took it a step further and performed interpolation between the depth points. First, I did Delaunay triangulation, and once I got a good triangulation, I interpolated between the 3 points on each facet and got a decent, fairly uniform depth image. Here are the zones where the interpolated depth is valid, imposed upon the RGB iamge.

enter image description here

Now, given the camera model, it's possible to project depth back into Cartesian coordinates at any point on the depth image (since the depth image was made such that each pixel corresponds to a point on the original RGB image, and we have the camera parameters of the RGB camera). However, if you look at the triangulation image and compare it to the original RGB image, you can see that depth is valid for all of the uninteresting points in the image: blank, featureless planes mostly. This isn't just true for this single set of images; it's a trend I'm seeing for the sensor. If a person stands in front of the sensor, for example, there are very few depth points within their silhouette.

As a result of this characteristic of the sensor, if I perform visual feature extraction on the image, most of the areas with corners or interesting textures fall in areas without associated depth information. Just an example: I detected 1000 SIFT keypoints from an an RGB image from an Xtion sensor, and 960 of those had valid depth values. If I do the same thing to this system, I get around 80 keypoints with valid depth. At the moment, this level of performance is unacceptable for my purposes.

I can guess at the underlying reasons for this: it seems like some sort of plane extraction algorithm is being used to get depth points, whereas Primesense/DepthSense sensors are using something more sophisticated.

So anyway, my main question here is: can we expect any improvement in the depth data at a later point in time, through improved RGB-IR image processing algorithms? Or is this an inherent limit of the current sensor?

like image 680
indianajohn Avatar asked Dec 26 '14 04:12

indianajohn


2 Answers

I am from the Project Tango team at Google. I am sorry you are experiencing trouble with depth on the device. Just so that we are sure your device is in good working condition, can you please test the depth performance against a flat wall. Instructions are as below: https://developers.google.com/project-tango/hardware/depth-test

Even with a device in good working condition, the depth library is known to return sparse depth points on scenes with low IR reflectance objects, small sized objects, high dynamic range scenes, surfaces at certain angles and objects at distances larger than ~4m. While some of these are inherent limitations in the depth solution, we are working with the depth solution provider to bring improvements wherever possible.

Attached an image of a typical conference room scene and the corresponding point cloud. As you can see, 1) no depth points are returned from the laptop screen (low reflectance), the table top objects such as post-its, pencil holder etc (small object sizes), large portions of the table (surface at an angles), room corner at the far right (distance >4m).

But as you move around the device, you will start getting depth point returns. Accumulating depth points is a must to get denser point clouds.

Please also keep us posted on your findings at [email protected]

image

like image 169
Ashish Shah Avatar answered Oct 29 '22 23:10

Ashish Shah


In my very basic initial experiments, you are correct with respect to depth information returned from the visual field, however, the return of surface points is anything but constant. I find as I move the device I can get major shifts in where depth information is returned, i.e. there's a lot of transitory opacity in the image with respect to depth data, probably due to the characteristics of the surfaces. So while no return frame is enough, the real question seems to be the construction of a larger model (point cloud to open, possibly voxel spaces as one scales up) to bring successive scans into a common model. It's reminiscent of synthetic aperture algorithms in spirit, but the letters in the equations are from a whole different set of laws. In short, I think a more interesting approach is to synthesize a more complete model by successive accumulation of point cloud data - now, for this to work, the device team has to have their dead reckoning on the money for whatever scale this is done. Also this addresses an issue that no sensor improvements can address - if your visual sensor is perfect, it still does nothing to help you relate the sides of an object at least be in the close neighborhood of the front of the object.

like image 35
Mark Mullin Avatar answered Oct 30 '22 01:10

Mark Mullin