Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly use std::normal_distribution in C++11?

Tags:

c++

c++11

I want to get random float numbers in the range [0.0,1.0], so most of these numbers should be around 0.5. Thus I came up with the following function:

static std::random_device __randomDevice;
static std::mt19937 __randomGen(__randomDevice());
static std::normal_distribution<float> __normalDistribution(0.5, 1);

// Get a normally distributed float value in the range [0,1].
inline float GetNormDistrFloat()
{
    float val = -1;
    do { val = __normalDistribution(__randomGen); } while(val < 0.0f || val > 1.0f);

    return val;
}

However, calling that function 1000 times leads to the following distribution:

0.0 - 0.25 : 240 times
0.25 - 0.5 : 262 times
0.5 - 0.75 : 248 times
0.75 - 1.0 : 250 times

I was expecting the first and last quarter of the range to show up much fewer than what is shown above. So it seems I am doing something wrong here.

Any ideas?

like image 470
Matthias Avatar asked Apr 28 '17 11:04

Matthias


2 Answers

Short answer: do not chop off the tails of the normal distribution.

Long answer: The problem is that with a standard deviation of 1 you have most values inside the interval [0,1]. If you take a look at the normal distribution:

enter image description here

The part you are using is very much at the center and you would need many more samples to detect a difference. Just cutting of values outside your range is absolutely not going to give you a normal distributed sample.

You can see that the cumulative densitiy function is almost linear in the [0,1] interval you are using:

enter image description here

Pictures generated with wolfram alpha.

At this zoom in the shape of the distribution is almost triangular, and you can check the output here for more samples:

#include <iostream>
#include <random>
using namespace std;

static std::random_device __randomDevice;
static std::mt19937 __randomGen(__randomDevice());
static std::normal_distribution<float> __normalDistribution(0.5, 1);

// Get a normally distributed float value in the range [0,1].
inline float GetNormDistrFloat()
{
    float val = -1;
    do { val = __normalDistribution(__randomGen); } 
    while(val < 0.0f || val > 1.0f);

    return val;
}

int main() {
    int count1=0;
    int count2=0;
    int count3=0;
    int count4=0;
    for (int i =0; i< 1000000; i++) {
        float val = GetNormDistrFloat();
        if (val<0.25){ count1++; continue;}
        if (val<0.5){ count2++; continue;}
        if (val<0.75){ count3++; continue;}
        if (val<1){ count4++; continue;}
    }
    std::cout<<count1<<", "<<count2<<", "<<count3<<", "<<count4<<std::endl;
    return 0;
}

Success time: 0.1 memory: 16072 signal:0

241395, 258131, 258275, 242199

First Option (suggested by Caleth): use (the) logistic function 1 / (1 + exp(-x)), which has a domain (−∞, +∞) and range [0,1]. This way you actually get the full normal distribution.

Another option: Its not as nice mathematically as the one above, but probably faster. You can use a standard normal distribution with mean 0 and deviation 1 and then remap to [0,1] from a much larger range such as +/- 4 standard deviations. Now you have the problem that the weight of your integral is not longer 1 but a little less. Its not actually a random variable anymore.

If you want to get a weight of 1, you can distribute the remaining tails (outside of 4 stds) by not rerolling but by getting a uniformly distributed random value from the [0,1] interval, this case:

val = NormalRand(0,1);
if abs(val) < 4 return val/8 + 0.5
else return UniformRand(0,1)

Another option (as suggested by interjay): simply decrease the standard deviation.

like image 193
Beginner Avatar answered Sep 19 '22 23:09

Beginner


It really helps to visualize this. I tend to like R where I can also bring in C++ code easily. So here is a slightly modified version of your code, generated standard normals (ie not truncated) and truncated as you do:

#include <random>
#include <Rcpp.h>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
std::vector<double> getNormals(int n) {
    std::vector<double> X(n);
    std::mt19937 engine(42);
    std::normal_distribution<> normal(0.0, 1.0);
    for (int i=0; i<n; i++) {
        X[i] = normal(engine);
    }
    return X;
}

// [[Rcpp::export]]
std::vector<double> getTruncatedNormals(int n) {
    std::vector<double> X(n);
    std::mt19937 engine(42);
    std::normal_distribution<> normal(0.0, 1.0);
    int i=0;
    while (i<n) {
        double x = normal(engine);
        if (x > -0.5 && x < 0.5) {
            X[i++] = x;
        }
    }
    return X;
}


/*** R
op <- par(mfrow=c(1,2)) # two plot
x <- getNormals(1000)
hist(x, main="Normal")
z <- getTruncatedNormals(1000)
hist(z, main="Truncated")
par(op)
*/

In an R session with the Rcpp package, I can just call Rcpp::sourceCpp("code.cpp") on the file and the code compiles, loads the two C++ functions and runs the R part at the end. I get this chart: enter image description here

And even at just 1000 draws, we see the bell curve of the normal, and the near uniform you get when only going 1/2 each side of the mean under a standard deviation of 1.

Long story short: OP knows how to create a distribution, even a truncated one, but now needs to figure out which distribution he wants.

Edit: At n=1e6 we see the curvate of the Normal even for the truncated case:

enter image description here

like image 43
Dirk Eddelbuettel Avatar answered Sep 16 '22 23:09

Dirk Eddelbuettel