Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bias in randomizing normally distributed numbers (javascript)

I’m having problems generating normally distributed random numbers (mu=0 sigma=1) using JavaScript.

I’ve tried Box-Muller's method and ziggurat, but the mean of the generated series of numbers comes out as 0.0015 or -0.0018 — very far from zero!! Over 500,000 randomly generated numbers this is a big issue. It should be close to zero, something like 0.000000000001.

I cannot figure out whether it’s a method problem, or whether JavaScript’s built-in Math.random() generates not exactly uniformly distributed numbers.

Has someone found similar problems?

Here you can find the ziggurat function:

http://www.filosophy.org/post/35/normaldistributed_random_values_in_javascript_using_the_ziggurat_algorithm/

And below is the code for the Box-Muller:

function rnd_bmt() {
    var x = 0, y = 0, rds, c;

    // Get two random numbers from -1 to 1.
    // If the radius is zero or greater than 1, throw them out and pick two
    // new ones. Rejection sampling throws away about 20% of the pairs.
    do {
        x = Math.random()*2-1;
        y = Math.random()*2-1;
        rds = x*x + y*y;
    }
    while (rds === 0 || rds > 1) 

    // This magic is the Box-Muller Transform
    c = Math.sqrt(-2*Math.log(rds)/rds);

    // It always creates a pair of numbers. I'll return them in an array. 
    // This function is quite efficient so don't be afraid to throw one away
    // if you don't need both.
    return [x*c, y*c];
}
like image 790
user1658162 Avatar asked Mar 07 '13 18:03

user1658162


1 Answers

If you generate n independent normal random variables, the standard deviation of the mean will be sigma / sqrt(n).

In your case n = 500000 and sigma = 1 so the standard error of the mean is approximately 1 / 707 = 0.0014. The 95% confidence interval, given 0 mean, would be around twice this or (-0.0028, 0.0028). Your sample means are well within this range.

Your expectation of obtaining 0.000000000001 (1e-12) is not mathematically grounded. To get within that range of accuracy, you would need to generate about 10^24 samples. At 10,000 samples per second that would still take 3 quadrillon years to do...this is precisely why it's good to avoid computing things by simulation if possible.

On the other hand, your algorithm does seem to be implemented correctly :)

like image 68
Andrew Mao Avatar answered Sep 21 '22 18:09

Andrew Mao