Let's say I have a geographical map, where points are represented by latitude\longitude. I have a number of points on this map, and points could be added\deleted\moved at any time.
What I need is, to get the "hottest spots" - the areas that include the highest numbers of points divided by area - or in other words, the areas with the highest density of points.
I need an efficient data structure and also an algorithm that re-calculates the hottest spots on any change. The computational complexity and memory complexity must be minimal, because the number of points could get very high.
I wish to know and maintain a list of the hottest spots in descending order - the most crowdest area first, then the less crowded areas. It's OK to have a list of limited size - for example, the 100 hottest spots.
Of course, to prevent 100% density on one isolated point, there is a minimal area (defined as a constant).
The definition of "area" here is any perceivable area on the map that contains points. It could be the whole map, but the algorithm shouldn't see that as a hot spot of course =)
Thanks ahead! If it needs any clarification please say so...
What you are doing is statistically known as '2-dimensional density estimation' (so you know where to look).
One common method is 'kernel smoothing'. Imagine a disc sitting on each of your data points. The kernel smoothing over the area is the number of discs overlapping at each point. That's a kernel smoothing using a uniform kernel of fixed radius.
You don't have to use uniform kernels, nor do they have to be the same size at all points - this is now "adaptive bandwidth kernel smoothing".
That gives you a grid which you can update pretty easily, especially if you have a finite kernel (you can use a Gaussian (aka Normal) kernel which is infinite, but would be clipped to your study area). Every time you add or take away a point, you add or take away its kernel's contribution to the density.
So that's half the problem - you now have a grid of densities. Then you can use clustering algorithms to partition it and find separate peaks according to whatever criteria a) defines a 'peak' and b) defines it as separate from an adjacent peak. Someone else has already suggested clustering algorithms. I have done this (ten years ago) using clustering functions in "R", the statistical package. Speed isn't its strong point though...
You might want to take this over to http://stats.stackexchange.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With