I have a set of polylines (numbering in the 100s of thousands, with each polyline having about 200-300 vertices). These represent routes on a map (all taken from Google Maps API, if that helps). The vertices are latitude/longitude coordinates.
I am now given a query polyline, and I have to find the "overlaps" of the query polyline with any of the existing polylines. The results thus themselves will be polylines, sorted in the order of maximum to least overlap. I only need the first 100 results or so. An additional issue is that the overlap need not be exact, but can be approximate (i.e., portions of line segments that are deemed overlapping needn't lie on other another, but only need be "close" to each other).
To give a concrete representation, in the left portion of the image below, the blue polyline (polyline A) is a polyline in the database, and the red polyline (polyline B) is the query polyline. The algorithm should determine the polyline marked in thick black as shown in the right.
I am currently leaning towards using a spatial database (the option under consideration is PostgreSQL + PostGIS), but I am not sure that the latency would be acceptable - the query needs to return results near instantaneously. My computational-geometry-fu is admittedly weak, but I was wondering: are there any existing algorithms or approaches that might prove useful to solve this particular problem?
Many thanks in advance!
Fast approximate query where you don't need to find all the matches smells like http://en.wikipedia.org/wiki/Locality-sensitive_hashing - and I suspect that you will get loads of hits with this. A while ago I was intrigued by http://www.cs.ubc.ca/~lowe/papers/09muja.pdf - I have no idea whether it works in practice but the same search that re-found the paper found a library at http://www.cs.ubc.ca/research/flann/. The wikipedia page on straight LSH has pointers to at least one implementation at the bottom, as well. LSH has the advantage of translating neatly to database lookup with either relational databases or dbm files.
Given the big problem size, I suggest that you start with a gridding approach. I mean overlaying a square grid on top of the map, and for every tile (let us call them pixels) keep a list of the polylines that cross it. In a way, this amounts to performing a raster scan conversion of the map, using the Bresenham algorithm or a variant.
Similarly, you can draw the query polyline and collect all the polylines that share one or more pixels with the former. You can keep a count of the common pixels to get a first estimate of the overlap length. It can be advisable to draw a "thick" line to absorb the inaccuracies due to the discretization.
After this first screening pass, the number of polylines to be considered will be tremendously smaller so that any brute force approach can be used for overlap assessment.
One critical issue is the grid resolution. Too coarse will result in inefficient candidate rejection. Too fine will increase preprocessing time/space in an unacceptable way.
Assuming a grid size such that you have W x H pixels, you will need W x H linked list pointers plus N x L pointers (for N polylines of average length L, in pixels - not in vertex count). The first term grows as the square of the resolution, while the second grows only linearly. Preprocessing time is linear in the size of this data structure (W x H to initialize the lists, N x L for Bresenham line drawings).
A query will cost roughly L' x K, where L' is the length of the query polyline and K the number of overlapping polylines found (in case K >> 1, use an efficient dictionary structure for the bookkeeping of the K candidates). This is proportional to the resolution.
PS: if the chosen resolution is such that you can assume no more than one polyline per pixel (this is an approximation), then the algorithm simplifies to: draw the whole map, every polyline in a different color; then draw the query polyline and note down the colors you cross. This is exactly what you sketched !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With