Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient algorithm for calculating areas on geographical map with the highest density of spots

Let's say I have a geographical map, where points are represented by latitude\longitude. I have a number of points on this map, and points could be added\deleted\moved at any time.

What I need is, to get the "hottest spots" - the areas that include the highest numbers of points divided by area - or in other words, the areas with the highest density of points.

I need an efficient data structure and also an algorithm that re-calculates the hottest spots on any change. The computational complexity and memory complexity must be minimal, because the number of points could get very high.

I wish to know and maintain a list of the hottest spots in descending order - the most crowdest area first, then the less crowded areas. It's OK to have a list of limited size - for example, the 100 hottest spots.

Of course, to prevent 100% density on one isolated point, there is a minimal area (defined as a constant).

The definition of "area" here is any perceivable area on the map that contains points. It could be the whole map, but the algorithm shouldn't see that as a hot spot of course =)

Thanks ahead! If it needs any clarification please say so...

like image 341
SirKnigget Avatar asked Feb 25 '23 20:02

SirKnigget


1 Answers

What you are doing is statistically known as '2-dimensional density estimation' (so you know where to look).

One common method is 'kernel smoothing'. Imagine a disc sitting on each of your data points. The kernel smoothing over the area is the number of discs overlapping at each point. That's a kernel smoothing using a uniform kernel of fixed radius.

You don't have to use uniform kernels, nor do they have to be the same size at all points - this is now "adaptive bandwidth kernel smoothing".

That gives you a grid which you can update pretty easily, especially if you have a finite kernel (you can use a Gaussian (aka Normal) kernel which is infinite, but would be clipped to your study area). Every time you add or take away a point, you add or take away its kernel's contribution to the density.

So that's half the problem - you now have a grid of densities. Then you can use clustering algorithms to partition it and find separate peaks according to whatever criteria a) defines a 'peak' and b) defines it as separate from an adjacent peak. Someone else has already suggested clustering algorithms. I have done this (ten years ago) using clustering functions in "R", the statistical package. Speed isn't its strong point though...

You might want to take this over to http://stats.stackexchange.com

like image 181
Spacedman Avatar answered May 04 '23 17:05

Spacedman