I want to create a large set of random point cloud in 2D plane that are non-degenerate (no 3 points in a straight line in the whole set). I have a naive solution which generates a random float pair P_new(x,y) and checks with every pair of points (P1, P2) generated till now if point (P1, P2, P) lie in same line or not. This takes O(n^2) checks for each new point added to the list making the whole complexity O(n^3) which is very slow if I want to generate more than 4000 points (takes more than 40 mins). Is there a faster way to generate these set of non-degenerate points?
Instead of checking the possible points collinearity on each cycle iteration, you could compute and compare coefficients of linear equations. This coefficients should be store in container with quick search. I consider using std::set, but unordered_map could fit either and could lead to even better results.
To sum it up, I suggest the following algorithm:
p
;p
and existing points (I mean usual A
,B
&C
). Here you need to do n
computations;n*log(n^2)
operations at maximum.O(log(n))
too.The whole complexity is reduced to O(n^2*log(n))
.
This algorithm requires additional storing of n^2*sizeof(Coefficient)
memory. But this seems to be ok if you are trying to compute 4000 points only.
O(n^2 log n) algorithm can be easily constructed in the following way:
For each point P in the set:
Sort other points by polar angle (cross-product as a comparison function, standard idea, see 2D convex hull gift-wrapping algorithm for example). In this step you should consider only points Q that satisfy
Q.x > P.x || Q.y >= P.y
Iterate over sorted list, equal points are lying on the same line.
Sorting is done in O(n log n), step 2. is O(n). This gives O(n^2 log n) for removing degenerate points.
Determining whether a set of points is degenerate is a 3SUM-hard problem. (The very first problem listed is determining whether three lines contains a common point; the equivalent problem under projective duality is whether three points belong to a common line.) As such, it's not reasonable to hope that a generate-and-test solution will be significantly faster than n2.
What are your requirements for the distribution?
generate random point Q
for previous points P calculate (dx, dy) = P - Q
and B = (asb(dx) > abs(dy) ? dy/dx : dx/dy)
sort the list of points P by its B value, so that points that form a line with Q will be in near positions inside the sorted list.
walk over the sorted list checking where Q forms a line with the current P value being considered and some next values that are nearer than a given distance.
Perl implementation:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Math::Vector::Real;
use Math::Vector::Real::Random;
use Sort::Key::Radix qw(nkeysort);
use constant PI => 3.14159265358979323846264338327950288419716939937510;
@ARGV <= 2 or die "Usage:\n $0 [n_points [tolerance]]\n\n";
my $n_points = shift // 4000;
my $tolerance = shift // 0.01;
$tolerance = $tolerance * PI / 180;
my $tolerance_arctan = 3 / 2 * $tolerance;
# I got to that relation using no so basic maths in a hurry.
# it may be wrong!
my $tolerance_sin2 = sin($tolerance) ** 2;
sub cross2d {
my ($p0, $p1) = @_;
$p0->[0] * $p1->[1] - $p1->[0] * $p0->[1];
}
sub line_p {
my ($p0, $p1, $p2) = @_;
my $a0 = $p0->abs2 || return 1;
my $a1 = $p1->abs2 || return 1;
my $a2 = $p2->abs2 || return 1;
my $cr01 = cross2d($p0, $p1);
my $cr12 = cross2d($p1, $p2);
my $cr20 = cross2d($p2, $p0);
$cr01 * $cr01 / ($a0 * $a1) < $tolerance_sin2 or return;
$cr12 * $cr12 / ($a1 * $a2) < $tolerance_sin2 or return;
$cr20 * $cr20 / ($a2 * $a0) < $tolerance_sin2 or return;
return 1;
}
my ($c, $f1, $f2, $f3) = (0, 1, 1, 1);
my @p;
GEN: for (1..$n_points) {
my $q = Math::Vector::Real->random_normal(2);
$c++;
$f1 += @p;
my @B = map {
my ($dx, $dy) = @{$_ - $q};
abs($dy) > abs($dx) ? $dx / $dy : $dy / $dx;
} @p;
my @six = nkeysort { $B[$_] } 0..$#B;
for my $i (0..$#six) {
my $B0 = $B[$six[$i]];
my $pi = $p[$six[$i]];
for my $j ($i + 1..$#six) {
last if $B[$six[$j]] - $B0 > $tolerance_arctan;
$f2++;
my $pj = $p[$six[$j]];
if (line_p($q - $pi, $q - $pj, $pi - $pj)) {
$f3++;
say "BAD: $q $pi-$pj";
redo GEN;
}
}
}
push @p, $q;
say "GOOD: $q";
my $good = @p;
my $ratiogood = $good/$c;
my $ratio12 = $f2/$f1;
my $ratio23 = $f3/$f2;
print STDERR "gen: $c, good: $good, good/gen: $ratiogood, f2/f1: $ratio12, f3/f2: $ratio23 \r";
}
print STDERR "\n";
The tolerance indicates the acceptable error in degrees when considering if three points are in a line as π - max_angle(Q, Pi, Pj)
.
It does not take into account the numerical instabilities that can happen when subtracting vectors (i.e |Pi-Pj|
may be several orders of magnitude smaller than |Pi|
). An easy way to eliminate that problem would be to also require a minimum distance between any two given points.
Setting tolerance to 1e-6, the program just takes a few seconds to generate 4000 points. Translating it to C/C++ would probably make it two orders of magnitude faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With