How to select points at a regular density

Question

how do I select a subset of points at a regular density? More formally,

Given

a set A of irregularly spaced points,
a metric of distance dist (e.g., Euclidean distance),
and a target density d,

how can I select a smallest subset B that satisfies below?

for every point x in A,
there exists a point y in B
which satisfies dist(x,y) <= d

My current best shot is to

start with A itself
pick out the closest (or just particularly close) couple of points
randomly exclude one of them
repeat as long as the condition holds

and repeat the whole procedure for best luck. But are there better ways?

I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.

bottom-up method

select a random point
select among unselected y for which min(d(x,y) for x in selected) is largest
keep going!

I'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?

performance measure

If guarantee of optimality is not required, I think these two indicators could be useful:

radius of coverage: max {y in unselected} min(d(x,y) for x in selected)
radius of economy: min {y in selected != x} min(d(x,y) for x in selected)

RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.

my little methods

For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:

bottom-up.norm top-down.norm

RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.

Here I packed everything.

http://www.filehosting.org/file/details/352267/density_sampling.tar.gz

Saeed Amiri · Accepted Answer

You can model your problem with graphs, assume points as nodes, and connect two nodes with edge if their distance is smaller than d, Now you should find the minimum number of vertex such that they are with their connected vertices cover all nodes of graph, this is minimum vertex cover problem (which is NP-Hard in general), but you can use fast 2-approximation : repeatedly taking both endpoints of an edge into the vertex cover, then removing them from the graph.

P.S: sure you should select nodes which are fully disconnected from the graph, After removing this nodes (means selecting them), your problem is vertex cover.

salva · Answer

A genetic algorithm may probably produce good results here.

update:

I have been playing a little with this problem and these are my findings:

A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:

start with B empty
select a random point x from A and place it in B
remove from A every point y such that dist(x, y) < d
while A is not empty go to 2

A kd-tree can be used to perform the look ups in step 3 relatively fast.

The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.

Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.

For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.

For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.

I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.

The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl

It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.

How to select points at a regular density