Best way to efficiently find high density regions

Tags:

Over the course of my coding, I have come across a problem as follows: Find the a region of fixed size in a 2D space that has the highest density of particles. The particles can be considered generally distributed randomly over the entire space, but in theory there should be some areas that have a higher density.

For example, 100 particles are placed randomly in a 2D grid that is 500x500, and I need to find the 50x50 region with the most particles (highest density).

Is there some other way to calculate the best region besides brute force testing every possible region (in this case about over 200000 regions)? That would scale up at O(n^2) for an n-length axis.

484

asked Oct 23 '13 05:10

WilBur

1 Answers

Algorithm 1

Create a 500x500 2D array, where each cell contains the count of the number of particles in that cell. Then convolve that array with a 50x50 kernel, the resulting array will have the count of particles in a 50x50 region in each cell. Then find the cell with the largest value.

If you are using a 50x50 box as a region, the kernel can be decomposed into two separate convolutions, one for each axis. The resulting algorithm is O(n^2) space and time, where n is the width and height of the 2D space you are searching.

As a reminder, a one-dimensional convolution with a boxcar function can be completed in O(n) time and space and it can be done in place. Let x(t) be the input for t=1..n, and let y(t) be the output. Define x(t)=0 and y(t)=0 for t<1 and t>n. Define the kernel f(t) to be 1 for 0..d-1 and 0 elsewhere. The definition for convolution gives us the following formula:

y(t) = sum i x(t-i) * f(i) = sum i=0..d-1 x(t-i)

This looks like it takes time O(n*d), but we can rewrite it as a recurrence:

y(t) = y(t-1) + x(t) - x(t-d)

This shows that the one-dimensional convolution is O(n), independent of d. To perform the two-dimensional convolution, you simply perform the one-dimensional convolution for each axis. This works because the boxcar kernel can be decomposed: in general, most kernels cannot be decomposed. The Gaussian kernel is another kernel that can be decomposed, which is why Gaussian blur in an image editing program is so fast.

For the kind of numbers you specify, this will be extremely fast. 500x500 is an extremely small data set, and your computer can check 202,500 regions in a few milliseconds at most. You will have to ask yourself whether it is worth the extra hours, days, or weeks of time it will take you to optimize further.

This is the same as justhalf's solution, except due to the decomposed convolution, the region size does not affect the algorithm's speed.

Algorithm 2

Assume there is at least one point. Without loss of generality, consider the 2D space to be the entire plane. Let d be the width and height of the region. Let N be the number of points.

Lemma: There exists a region of maximum density which has a point on its left edge.

Proof: Let R be a region of maximum density. Let R' be the same region, translated right by the distance between the left edge of R and the leftmost point in R. All points in R must also lie in R', therefore R' is also a region of maximum density.

The algorithm

Insert all points into a K-D tree. This can be done in O(N log² N) time.
For each point, consider the region of width d and height 2d where the point is centered on the left edge of the region. Call this region R.
Query the K-D tree for the points in region R. Call this set S. This can be done in O(N^1/2+|S|) time.
Find the d x d subregion of R containing the largest number of points in S. This can be done in O(|S| log |S|) time by sorting S by y-coordinate and then performing a linear scan.

The resulting algorithm has a time of O(N^3/2 + N |S| log |S|).

Comparison

Algorithm #1 is superior to algorithm #2 when the density is high. Algorithm #2 is only superior when the density of particles is very low, and the density at which algorithm #2 is superior decreases as the total board size increases.

Note that the continuous case can be considered to have zero density, at which point only algorithm #2 works.

107

answered Oct 20 '22 19:10

Dietrich Epp

Related questions
                            
                                PHP equivalent of .Net Entity Framework [closed]
                            
                                Can std::hash be used to hash function pointers?
                            
                                Timer in Background
                            
                                Restart a gif animation without reloading the file
                            
                                "PDB does not match image" error in C# VS2010 project
                            
                                Is it better to use root logger or named logger in Python
                            
                                Capture "Segmentation fault" message for a crashed subprocess: no out and err after a call to communicate()
                            
                                reprojectImageTo3D() in OpenCV
                            
                                Font Family Open Sans not being used
                            
                                Changing a MediaStream of RTCPeerConnection
                            
                                Rails unable to autoload constant from file despite being defined in that file
                            
                                Run custom ROM on Android Emulator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With