Is it feasible to use a neural network to estimate distance in a still image or video stream?
I have a laser ranger finder that provides video output as well as a distance measurement. However, the distance measurement requires shining a laser into the environment, which isn't always ideal or allowed. I'd like to have an option to switch this into "passive" mode where the image is fed to a neural network, which then provides a distance estimation without the need to activate the laser. The network would initially be trained on the image+distance pair from the ranger finder in active mode.
I'm no expert on neural networks, and although Google finds lots of uses for neural networks with image classification and pose estimation, I can't find any prior art for distance estimation. Does this seem practical, or am I wasting my time? Would a basic feed-forward network with one input per N pixels be enough or would I need a different architecture?
Yes, it is possible, assuming you have ground-truth data for training. As early as 2006, there were publications on this subject, but using Markov Random Fields. You can read it here. More recently is was done with Convolutional Neural Networks and Deep Convolutional Neural Fields. Those 3 examples estimate the depth of every single pixel on the images, so they need the correct measurement for each of them.
If you're using a planar range finder, you'll have the correct depth for various columns of your image, according to your laser's resolution. This may imply that you need to train your NN with single rows of pixels from your images instead of full images. For full scene depth extraction, people usually employ binocular cameras or something like Kinect (just for training, of course).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With