Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't this random number generator thread-safe?

I was using rand() function to generate pseudo-random numbers between 0,1 for a simulation purpose, but when I decided to make my C++ code run in parallel (via OpenMP) I noticed rand() isn't thread-safe and also is not very uniform.

So I switched to using a (so-called) more uniform generator presented in many answers on other questions. Which looks like this

double rnd(const double & min, const double & max) {
    static thread_local mt19937* generator = nullptr;
    if (!generator) generator = new mt19937(clock() + omp_get_thread_num());
    uniform_real_distribution<double> distribution(min, max);
    return fabs(distribution(*generator));
}

But I saw many scientific errors in my original problem which I was simulating. Problems which were both against the results from rand() and also against common sense.

So I wrote a code to generate 500k random numbers with this function, calculate their average and do this for 200 times and plot the results.

double SUM=0;
for(r=0; r<=10; r+=0.05){   
    #pragma omp parallel for ordered schedule(static)
    for(w=1; w<=500000; w++){   
        double a;
        a=rnd(0,1);
        SUM=SUM+a;
    } 
    SUM=SUM/w_max;
    ft<<r<<'\t'<<SUM<<'\n';
    SUM=0;
}   

We know if instead of 500k I could do it for infinite times it should be a simple line with value 0.5. But with 500k we will have fluctuations around 0.5.

When running the code with a single thread, the result is acceptable:

enter image description here

But here is the result with 2 threads:

enter image description here

3 threads:

enter image description here

4 threads:

enter image description here

I do not have my 8 threaded CPU right now but the results were even worth there.

As you can see, They are both not uniform and very fluctuated around their average.

So is this pseudo-random generator thread-unsafe too?

Or am I making a mistake somewhere?

like image 726
Alireza Avatar asked May 15 '18 15:05

Alireza


2 Answers

There are three observations about your test output I would make:

  • It has much stronger variance than a good random source's average should provide. You observed this yourself by comparing to the single thread results.

  • The calculated average decreases with thread count and never reaches the original 0.5 (i.e. it's not just higher variance but also changed mean).

  • There is a temporal relation in the data, particularly visible in the 4 thread case.

All this is explained by the race condition present in your code: You assign to SUM from multiple threads. Incrementing a double is not an atomic operation (even on x86, where you'll probably get atomic reads and writes on registers). Two threads may read the current value (e.g. 10), increment it (e.g. both add 0.5) and then write the value back to memory. Now you have two threads writing a 10.5 instead of the correct 11.

The more threads try to write to SUM concurrently (without synchronization), the more of their changes are lost. This explains all observations:

  • How hard the threads race each other in each individual run determines how many results are lost, and this can vary from run to run.

  • The average is lower with more races (for example more threads) because more results are lost. You can never exceeed the statistical 0.5 average because you only ever lose writes.

  • As the threads and scheduler "settle in", the variance is reduced. This is a similar reason to why you should "warm up" your tests when benchmarking.

Needless to say, this is undefined behavior. It just shows benign behavior on x86 CPUs, but this is not something the C++ standard guarantees you. For all you know, the individual bytes of a double might be written to by different threads at the same time resulting in complete garbage.

The proper solution would be adding the doubles thread-locally and then (with synchronization) adding them together in the end. OMP has reduction clauses for this specific purpose.

For integral types, you could use std::atomic<IntegralType>::fetch_add(). std::atomic<double> exists but (before C++20) the mentioned function (and others) are only available for integral types.

like image 166
Max Langhof Avatar answered Nov 15 '22 18:11

Max Langhof


The problem is not in your RNG, but in your summation. There is simply a race condition on SUM. To fix this, use a reduction, e.g. change the pragma to:

#pragma omp parallel for ordered schedule(static) reduction(+:SUM)

Note that using thread_local with OpenMP is technically not defined behavior. It will probably work in practice, but the interaction between OpenMP and C++11 threading concepts is not well defined (see also this question). So the safe OpenMP alternative for you would be:

static mt19937* generator = nullptr;
#pragma omp threadprivate(generator)
like image 45
Zulan Avatar answered Nov 15 '22 17:11

Zulan