I'm trying to estimate depth from a stereo system with two cameras. The simple equation that I use is:
Baseline*Focal
Depth = ----------------------
Disparity
The field of view of the two cameras doesn't change the maximum depth allowed? It changes only the minimum depth measurable?
Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereop- sis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues— such as texture variations and gradients, defocus, color/haze, etc.
Depth Estimation is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images.
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR.
How do we estimate depth? Our eyes estimate depth by comparing the image obtained by our left and right eye. The minor displacement between both viewpoints is enough to calculate an approximate depth map. We call the pair of images obtained by our eyes a stereo pair.
At the top end the measurable depth is limited by the resolution of the cameras you use, which is reflected in the disparity. As depth becomes greater disparity tends to zero. With a greater field of view it will effectively be zero at a lower depth. Thus a greater field of view lowers the maximum depth measurable, but you can compensate somewhat by using higher resolution cameras.
To clarify: you should note that (if you do things correctly) you measure disparity in pixels but then convert it to meters (or milimeters as I do below). The full formula is then:
Baseline * Focal length
Depth = ----------------------------
Pixel disparity * Pixel size
Suppose you have the following setup:
Baseline (b) = 8 cm (80 mm)
Focal length (f) = 6.3 mm
Pixel size (p) = 14 um (0.014 mm)
The smallest disparity you can measure is 1 pixel. With the known numbers this translates to:
Depth = (80*6.3)/(1*0.014) = 36,000 mm = 36 m
So in these circumstances this would be your cap. Note that your measurement is wildly inaccurate at this range. The next possible disparity (2 pixels) occurs at a depth of 18m, the next after that (3 pixels) at 12m, etc. Doubling your baseline would double the range to 72m. Doubling your focal length would also double your range, but note that both would negatively affect you at the short end. You could also increase your maximum depth by decreasing the pixel size.
At a pixel size of 0.014 mm, you are probably talking about a CCD with a horizontal resolution of something like 1024 pixels, for a CCD of about 14.3 mm wide. If you double the number of pixels in the same area you would double your maximum range without loosing anything at the near end (because the limitations there are determined by baseline and focal length, which stay the same).
This is a very good overview of the tradeoffs in depth measurement in stereo vision. And this article on wikipedia has some good info on the relationship between pixel size, ccd size, focal length and field of view.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With