I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).
The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).
Does anyone know any algorithm which may help me with this particular task?
The Partition Algorithm executes in two phases: > Phase I: the algorithm logically divides the database into a number of. non-overlapping partitions. The partitions are considered one at a time and all large itemsets for that partition are generated.
A key example is the partitioning of membrane-bound proteins via lipid domain formation or cytoskeleton-induced corralling. However, the impact of this spatial heterogeneity on biochemical signaling processes is poorly understood.
Space-partitioning systems are often hierarchical, meaning that a space (or a region of space) is divided into several regions, and then the same space-partitioning system is recursively applied to each of the regions thus created. The regions can be organized into a tree, called a space-partitioning tree.
Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array an element called a pivot is selected. All elements smaller than the pivot are moved before it and all greater elements are moved after it.
Just to understand the problem. The following is crude and perform badly, but I want to know if the result is what you want>
Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)
Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.
Is that what you want to achieve?
I think I'd start with the following, which is close to what @belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.
I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.
If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.
I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With