So I'm working on a project where me and a buddy of mine scanned a room using the KINECTv2 and made a 3D model out of it. The goal is to make it possible to add 3d models of different kinds of furniture in real time. To that goal I'm trying out different plane-fitting algorithms in order to find wich one would work the fastest. Does anybody have any suggestions? So far I've only researched the usage of the basic RANSAC algorithm included in PCL.
Two common approaches for plane fitting are RANSAC and Hough. Here's one performance comparison:
https://www.researchgate.net/publication/259519997_Continuous_plane_detection_in_point-cloud_data_based_on_3D_Hough_Transform
As with many problems in computational geometry and image processing, rather than consider what is "fastest," consider what is optimal for your application in terms of performance, development effort, and cost. Searching for the fastest possible algorithm could lead you down a path of horrifying cost and complexity, whereas you may able to implement a data processing chain of relatively simpler algorithms that run just fast enough to present a smooth and pleasant experience to the user.
Long story short, I recommend starting with a Hough plane fit. Hough transform algorithms are relatively easy to write (once you grasp the basics), and tweaking parameters is intuitive.
https://en.wikipedia.org/wiki/Hough_transform
One of the reasons to write your own algorithm is that you'll be in a better position to understand what changes need to be made once (not if) you discover the point cloud data is noisier and less well behaved than one would like.
Achieving good speed is going to rely on a number of factors, including these:
Pre-processing. You don't need to fit planes of arbitrary size to arbitrary point clouds: instead, you need to fit walls and maybe floors and ceilings. For Hough algorithms this means you can limit the range over which parameters are tested, and hence speed up processing.
Rather than try to find all the plane fits for the complete, original point cloud, find ways to break up the point cloud into clusters of subclouds to which plane fit tests can be run more efficiently.
PCL can calculate surface normals for you. If you can identify clusters of surface normals pointed in roughly the same direction, and then try plane fits for individual clusters, you should be able to speed things up considerably.
Also, for your first pass you'll probably want to downsample your data and try fits on relatively fewer points. This is analogous to creating an "image pyramid" for 2D processing.
Octrees are nice, simple means of dividing up space for queries, collision tests, and so on. An octree divides a space into eight nodes or "octants." This can be imagined as cutting a cube into eight smaller cubes. Then each octant is further divided into eight octants, and so on. If an octant (a node) doesn't contain points, you don't need to divide it further.
https://en.wikipedia.org/wiki/Octree
Parameterization. The descriptions above should make it clear that if you can preprocess the data by simplifying and/or breaking up the original raw point cloud, then you'll be able to test much more narrowly defined searches that will run more quickly.
For that matter, you probably don't need high accuracy in plane fits. You can generate reasonably good fits, then tweak those fits to generate ceilings, walls, and floors at right angles to each other.
3D data quality. The Kinect v2 is a time-of-flight device with some inherent measurement accuracy problems. For example, if you take a single image of a flat wall and then check the depth values, you'll notice some non-planar goofiness in the corners of the image. If you take a look at the range (or standard deviation) of depth values at each (x,y) pixel over multiple images, then you'll also notice differences in noisiness between center pixels and edge pixels.
Once you perform a plane fit, generate a measure of the fit quality. This requires going back through the data to calculate point-to-plane distances for the points used for calculation. (To speed this up, only use every Nth point or randomly sample the points.) As you tinker with parameters you'll see the effects in terms of speed and fit quality.
Real time vs. Perceptibly smooth. If you only need the user to move furniture around in real time, it should be okay to spend longer generating the initial plane fits.
Multithreading/Parallelism To handle data input, plane fitting, and user interface you'll almost certain have to think hard about multithreading. To test algorithms you do work on the UI thread just to get started, but this is limiting.
An application like this begs for CUDA or OpenCL. For the 3D display you'll be using the graphics card anyway. Although you don't need to jump into GPU programming right away, it's helpful to keep in mind how algorithms could be parallelized.
3D display. Were you planning to use Direct3D or OpenGL for 3D display and interaction? Implementing software to allow the user to "add 3d models of different kinds of furniture in real time" suggests you'll have to rely on the graphics card.
If you can display the point cloud in a 3D view, maybe you don't even need plane fits. You might even get away with collision detection: if the 3D model of a chair bumps into a cluster of points (i.e. a wall), then you might just detect that collision rather than try to fit planes to define bounds. Octants and other space-dividing techniques will help speed up collision tests.
The company Matterport (http://matterport.com/) has developed something very much like what you're describing. If nothing else you can give their software a try and consider what might be improved/adapted for your needs.
I appreciate Rethunk's detailed comments and provide a variant of Local Hough Transform. But first, let me point out that there are a bunch of stackoverflow / stackexchange posts on plane detection or detection of intersecting planes. Some of those are:
The method I'll suggest is explained in detail in a publication at 3DV 2015:
Local Hough Transform for 3D Primitive Detection, Bertram Drost, Slobodan Ilic, IEEE 3D Vision 2015
The idea is based on selecting two oriented pairs of points. The orientations of these points are compared to decide whether the points jointly lie on a plane or not. The contributions of all such pairs of points are combined in a local voting space, where the plane is parameterized in a 0-dimensional voting space (an oriented point fully determines the plane). The technique is extendible to different primitives.
RANSAC is generally inferior to the Hough transform and yet the proposed method can be seen as a hybrid between a global voting scheme and RANSAC. While RANSAC selects multiple random points, enough to fit the target primitive, the proposed method selects only a single point, the reference point.
I also have another stackexchange post explaining how one could potentially develop a similar method for orthogonal planes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With