how do I select a subset of points at a regular density? More formally,
Given
dist
(e.g., Euclidean distance),how can I select a smallest subset B that satisfies below?
dist(x,y) <= d
My current best shot is to
and repeat the whole procedure for best luck. But are there better ways?
I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
y
for which min(d(x,y) for x in selected)
is largestI'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
If guarantee of optimality is not required, I think these two indicators could be useful:
max {y in unselected} min(d(x,y) for x in selected)
min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE
is more desirable.
For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:
RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
You can model your problem with graphs, assume points as nodes, and connect two nodes with edge if their distance is smaller than d, Now you should find the minimum number of vertex such that they are with their connected vertices cover all nodes of graph, this is minimum vertex cover problem (which is NP-Hard in general), but you can use fast 2-approximation : repeatedly taking both endpoints of an edge into the vertex cover, then removing them from the graph.
P.S: sure you should select nodes which are fully disconnected from the graph, After removing this nodes (means selecting them), your problem is vertex cover.
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With