I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers.
I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert 1000 Points with random coordinates they would be scattered all over the field which would make a cluster analysis worthless.
Is there a simple way to generate Random Numbers which build clusters?
I already thought about either not using random()
but random()*random()
which generates normally distributed numbers (I think I read this somewhere here on Stack Overflow).
Second approach would be picking a few areas at random and run the point generation again in this area which would of course produce a cluster in this area.
Do you have a better idea?
To generate an odd number, let's say between 0 and 100, we will first generate a random number between 0 and 49, then multiply this number by 2 (this will hence become an even number between 0 and 98) and finally we will add 1 to make it an odd number (between 1 and 99).
Indeed, it is fundamentally impossible to produce truly random numbers on any deterministic device.
Seventeen is: Described at MIT as 'the least random number', according to the Jargon File. This is supposedly because in a study where respondents were asked to choose a random number from 1 to 20, 17 was the most common choice. This study has been repeated a number of times.
If you are deliberately producing well formed clusters (rather than completely random clusters), you could combine the two to find a cluster center, and then put lots of points around it in a normal distribution.
As well working in cartesian coords (x,y); you could use a radial method to distribute points for a particular cluster. Choose a random angle (0-2PI radians), then choose a radius. Note that as circumference is proportional radius, the area distribution will be denser close to the centre - but the distribution per specific radius will be the same. Modify the radial distribution to produce a more tightly packed cluster.
OR you could use real world derived data for semi-random point distributions with natural clustering. Recently I've been doing quite a bit of geospatial cluster analysis. For this I have used real world data - zipcode centroids (which form natural clusters around cities); and restaurant locations. Another suggestion: you could use a stellar catalogue or galactic catalogue.
Generate few anchors. True random numbers. Then generate noise around them:
anchor + dist * (random() - 0.5))
this will generate clustered numbers, that will be evenly distributed in distance dist
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With