I have a database of user submitted latitude/longitude points and am trying to group 'close' points together. 'Close' is relative, but for now it seems to ~500 feet. At first it seemed I could just group by rows that have the same latitude/longitude for the first 3 decimal places (roughly a 300x300 box, understanding that it changes as you move away from the equator). However, that method seems to be quite lacking. 'Closeness' can't be significantly different than the distance each decimal place represents. It doesn't take into account that two locations may have different digits in the 3rd (or any) decimal place, but still be within the distance that place represents (<code>33.1239</code> and <code>33.1240</code>). I've also mulled over the situation where Point A, and Point C are both 'close' to Point B (but not each other) - should they be grouped together? If so, what happens when Point D is 'close' to point C (and no other points) - should it be grouped as well. Certainly I have to determine the desired behavior, but how would either be implemented? Can anyone point me in the right direction as to how this can be done and what different methods/approaches can be used? I feel a bit like I'm missing something obvious. Currently the data is an a MySQL database, use by a PHP application; however, I'm open to other storage methods if they're a key part in accomplishing this. here.

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If <code>(x1, y1)</code> represents your first point and <code>(x2, y2)</code> represents your second, the distance is <pre class="prettyprint"><code>d = sqrt( (x2-x1)^2 + (y2-y1)^2 ) </code></pre> Regarding grouping, you may want to use some sort of 2-D mean to determine how "close" things are to each other. For example, if you have three points, <code>(x1, y1)</code>, <code>(x2, y2)</code>, <code>(x3, y3)</code>, you can find the center of these three points by simple averaging: <pre class="prettyprint"><code>x(mean) = (x1+x2+x3)/3 y(mean) = (y1+y2+y3)/3 </code></pre> You can then see how close each is to the center to determine whether it should be part of the "cluster". <hr> There are a number of ways one can define clusters, all of which use some variant of a clustering algorithm. I'm in a rush now and don't have time to summarize, but check out the link and the algorithms, and hopefully other people will be able to provide more detail. Good luck!

How to group latitude/longitude points that are 'close' to each other?

Tags:

sql

database

location

geolocation

cluster-analysis

I have a database of user submitted latitude/longitude points and am trying to group 'close' points together. 'Close' is relative, but for now it seems to ~500 feet.

At first it seemed I could just group by rows that have the same latitude/longitude for the first 3 decimal places (roughly a 300x300 box, understanding that it changes as you move away from the equator).

However, that method seems to be quite lacking. 'Closeness' can't be significantly different than the distance each decimal place represents. It doesn't take into account that two locations may have different digits in the 3rd (or any) decimal place, but still be within the distance that place represents (33.1239 and 33.1240).

I've also mulled over the situation where Point A, and Point C are both 'close' to Point B (but not each other) - should they be grouped together? If so, what happens when Point D is 'close' to point C (and no other points) - should it be grouped as well. Certainly I have to determine the desired behavior, but how would either be implemented?

Can anyone point me in the right direction as to how this can be done and what different methods/approaches can be used?

I feel a bit like I'm missing something obvious.

Currently the data is an a MySQL database, use by a PHP application; however, I'm open to other storage methods if they're a key part in accomplishing this. here.

586

asked Dec 03 '10 19:12

Tim Lytle

2 Answers

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is

d = sqrt( (x2-x1)^2 + (y2-y1)^2 )

Regarding grouping, you may want to use some sort of 2-D mean to determine how "close" things are to each other. For example, if you have three points, (x1, y1), (x2, y2), (x3, y3), you can find the center of these three points by simple averaging:

x(mean) = (x1+x2+x3)/3 y(mean) = (y1+y2+y3)/3

You can then see how close each is to the center to determine whether it should be part of the "cluster".

There are a number of ways one can define clusters, all of which use some variant of a clustering algorithm. I'm in a rush now and don't have time to summarize, but check out the link and the algorithms, and hopefully other people will be able to provide more detail. Good luck!

answered Sep 18 '22 18:09

eykanal

Use something similar to the method you outlined in your question to get an approximate set of results, then whittle that approximate set down by doing proper calculations. If you pick your grid size (i.e. how much you round off your co-ordinates) correctly, you can at least hope to reduce the amount of work to be done to an acceptable level, although you have to manage what that grid size is.

For example, the earthdistance extension to PostgreSQL works by converting lat/long pairs to (x,y,z) cartesian co-ordinates, modelling the Earth as a uniform sphere. PostgreSQL has a sophisticated indexing system that allows these co-ordinates, or boxes around them, to be indexed into R-trees, but you can whack something together that is still useful without that.

If you take your (x,y,z) triple and round off- i.e. multiply by some factor and truncate to integer- you then have three integers that you can concatenate to produce a "box name", which identifies a box in your "grid" that the point is in.

If you want to search for all points within X km of some target point, you generate all the "box names" around that point (once you've converted your target point to an (x,y,z) triple as well, that's easy) and eliminate all the boxes that don't intersect the Earth's surface (tricker, but use of the x^2+y^2+z^2=R^2 formula at each corner will tell you) you end up with a list of boxes target points can be in- so just search for all points matching one of those boxes, which will also return you some extra points. So as a final stage you need to calculate the actual distance to your target point and eliminate some (again, this can be sped up by working in Cartesian co-ordinates and converting your target great-circle distance radius to secant distance).

The fiddling around comes down to making sure you don't have to search too many boxes, but at the same time don't bring in too many extra points. I've found it useful to index each point on several different grids (e.g. resolutions of 1Km, 5Km, 25Km, 125Km etc). Ideally you want to be searching just one box, remember it expands to at least 27 as soon as your target radius exceeds your grid size.

I've used this technique to construct a spatial index using Lucene rather than doing calculations in a SQL databases. It does work, although there is some fiddling to set it up, and the indices take a while to generate and are quite big. Using an R-tree to hold all the co-ordinates is a much nicer approach, but would take more custom coding- this technique basically just requires a fast hash-table lookup (so would probably work well with all the NoSQL databases that are the rage these days, and should be usable in a SQL database too).

answered Sep 19 '22 18:09

araqnid

Related questions
                            
                                get the date and time for today at midnight and add to it
                            
                                What is the best way to select multiple rows by ID in sql?
                            
                                How to use a contract class in android?
                            
                                SQL Comments on Create Table on SQL Server 2008
                            
                                How to take backup of functions only in Postgres
                            
                                rails scope to check if association does NOT exist
                            
                                What is the order of execution for this SQL statement
                            
                                error 1064(42000) while trying to execute mysqldump command [duplicate]
                            
                                Why are aggregate functions not allowed in where clause
                            
                                When to use a left outer join?
                            
                                How to use group by with union in T-SQL
                            
                                How would I find the second largest salary from the employee table? [closed]
                            
                                List of differences between SQL databases
                            
                                How to select columns from a table which have non null values?
                            
                                Can I use enum parameter into JpaRepository nativeQuery?
                            
                                Where can I find SQL language specification [closed]
                            
                                Is Explicit Transaction Rollback Necessary?
                            
                                Twisted + SQLAlchemy and the best way to do it
                            
                                How to use aliases with MySQL LEFT JOIN
                            
                                why varbinary instead of varchar [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With