We need to build a model of the shop floor in which we can relate pixel coordinates(x, y) from camera images to the actual objects in the 3D space of the store. The camera images, which will act as sources for generating such a model, suffer from fish-eye distortions. Hence straight lines actually appear as curves in the camera images and the walls appear to meet each other at not exactly right angles.
We are sub-dividing the region into polygons. Each polygon on the image refers to a particular region such as a shelf, display area, checkout counter etc. By mapping the pixels that fall in each polygon, we want to relate it as belonging to the shelf corresponding to that region.
Any ideas how to go about it?
Following is a sample image of the store with some polygons marked:
EDIT: We are not looking to find out the 3D coordinates, we just need to know which shelf is any polygon mapped to. So if the user clicks on a polygon, we can say he clicked on which shelf.
We are able to manage the above for big polygons like the ones shown in the image, but the shelves away from the camera can be as small as a few pixels so we need some kind of a probabilistic result saying if the user clicked at (x,y) what is the probability that he was trying to click on Shelf-A or what is the probability that he was trying to click on Shelf-B and so on.
Basically, what we are looking for is a probability function which would return the probabilities of click on nearby objects when a small polygon(or a pixel) is clicked on the 2D image.
EDIT2: One thing which is not apparent from the sample image is that the polygon size could be really small(as small as a few pixels) and polygons in turn could be really close to each other.
Moreover, the use case is that a customer in the store picks a product from one of the shelves. The application user would click on a point in the image from which he thinks the products is picked up. Now since the polygons are so small and so close, the user can only guess the exact point of pickup, so we can only know at best that it could be any one of the 3-4 polygons close to the point of click. So the question is how to calculate probabilities for these 3-4 polygons given the click?
As suggested here distance of the click from the center of polygon and its area could be parameters in calculation of this probability, what I am wondering is if there is algorithm to do so.
We are not looking to find out the 3D coordinates, we just need to know which shelf is any polygon mapped to. So if the user clicks on a polygon, we can say he clicked on which shelf.
I assume you have a mapping from polygon to shelf name. For example, as a list of pairs (polygon, shelf name). You can make it by hand once, if the cameras are fixed and don't move. Then your problem is only finding which polygon does a point belong to.
If you use OpenCV, then you can use its PointPolygonTest
function. Otherwise you may write a similar function yourself. See, for example, a Ray casting algorithm. Then look through the list until you find a polygon which the point lies withing.
To further optimize the program you may precalculate polygons' extents. An extents allows you to quickly say when the point is definitely not inside the polygon, and consider only the remaining polygons. But with so few polygons as you have in the image, I would not bother.
Basically, what we are looking for is a probability function which would return the probabilities of click on nearby objects when a small polygon(or a pixel) is clicked on the 2D image.
Just run an experiment, try to click a single highlighted pixel, accumulate some statistics on where the operator does actually clicks. Once you have this, it's easy to predict the number of out-of-object clicks and how far they are likely to be off.
Without such experiment with exactly the same kind of person, the same usage conditions and the same pointing device you are going to use, you cannot really tell how much off the clicks are going to be. I believe that many people are sniper clickers if the mouse is good and they can see the image well. If they are forced to use touch interface or some other pointing device, the precision may be lower.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With