I want to get random float numbers in the range [0.0,1.0]
, so most of these numbers should be around 0.5
. Thus I came up with the following function:
static std::random_device __randomDevice;
static std::mt19937 __randomGen(__randomDevice());
static std::normal_distribution<float> __normalDistribution(0.5, 1);
// Get a normally distributed float value in the range [0,1].
inline float GetNormDistrFloat()
{
float val = -1;
do { val = __normalDistribution(__randomGen); } while(val < 0.0f || val > 1.0f);
return val;
}
However, calling that function 1000 times leads to the following distribution:
0.0 - 0.25 : 240 times
0.25 - 0.5 : 262 times
0.5 - 0.75 : 248 times
0.75 - 1.0 : 250 times
I was expecting the first and last quarter of the range to show up much fewer than what is shown above. So it seems I am doing something wrong here.
Any ideas?
Short answer: do not chop off the tails of the normal distribution.
Long answer: The problem is that with a standard deviation of 1 you have most values inside the interval [0,1]. If you take a look at the normal distribution:
The part you are using is very much at the center and you would need many more samples to detect a difference. Just cutting of values outside your range is absolutely not going to give you a normal distributed sample.
You can see that the cumulative densitiy function is almost linear in the [0,1] interval you are using:
Pictures generated with wolfram alpha.
At this zoom in the shape of the distribution is almost triangular, and you can check the output here for more samples:
#include <iostream>
#include <random>
using namespace std;
static std::random_device __randomDevice;
static std::mt19937 __randomGen(__randomDevice());
static std::normal_distribution<float> __normalDistribution(0.5, 1);
// Get a normally distributed float value in the range [0,1].
inline float GetNormDistrFloat()
{
float val = -1;
do { val = __normalDistribution(__randomGen); }
while(val < 0.0f || val > 1.0f);
return val;
}
int main() {
int count1=0;
int count2=0;
int count3=0;
int count4=0;
for (int i =0; i< 1000000; i++) {
float val = GetNormDistrFloat();
if (val<0.25){ count1++; continue;}
if (val<0.5){ count2++; continue;}
if (val<0.75){ count3++; continue;}
if (val<1){ count4++; continue;}
}
std::cout<<count1<<", "<<count2<<", "<<count3<<", "<<count4<<std::endl;
return 0;
}
Success time: 0.1 memory: 16072 signal:0
241395, 258131, 258275, 242199
First Option (suggested by Caleth): use (the) logistic function 1 / (1 + exp(-x)), which has a domain (−∞, +∞) and range [0,1]. This way you actually get the full normal distribution.
Another option: Its not as nice mathematically as the one above, but probably faster. You can use a standard normal distribution with mean 0 and deviation 1 and then remap to [0,1]
from a much larger range such as +/- 4 standard deviations. Now you have the problem that the weight of your integral is not longer 1 but a little less. Its not actually a random variable anymore.
If you want to get a weight of 1, you can distribute the remaining tails (outside of 4 stds) by not rerolling but by getting a uniformly distributed random value from the [0,1]
interval, this case:
val = NormalRand(0,1);
if abs(val) < 4 return val/8 + 0.5
else return UniformRand(0,1)
Another option (as suggested by interjay): simply decrease the standard deviation.
It really helps to visualize this. I tend to like R where I can also bring in C++ code easily. So here is a slightly modified version of your code, generated standard normals (ie not truncated) and truncated as you do:
#include <random>
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
std::vector<double> getNormals(int n) {
std::vector<double> X(n);
std::mt19937 engine(42);
std::normal_distribution<> normal(0.0, 1.0);
for (int i=0; i<n; i++) {
X[i] = normal(engine);
}
return X;
}
// [[Rcpp::export]]
std::vector<double> getTruncatedNormals(int n) {
std::vector<double> X(n);
std::mt19937 engine(42);
std::normal_distribution<> normal(0.0, 1.0);
int i=0;
while (i<n) {
double x = normal(engine);
if (x > -0.5 && x < 0.5) {
X[i++] = x;
}
}
return X;
}
/*** R
op <- par(mfrow=c(1,2)) # two plot
x <- getNormals(1000)
hist(x, main="Normal")
z <- getTruncatedNormals(1000)
hist(z, main="Truncated")
par(op)
*/
In an R session with the Rcpp package, I can just call Rcpp::sourceCpp("code.cpp")
on the file and the code compiles, loads the two C++ functions and runs the R part at the end. I get this chart:
And even at just 1000 draws, we see the bell curve of the normal, and the near uniform you get when only going 1/2 each side of the mean under a standard deviation of 1.
Long story short: OP knows how to create a distribution, even a truncated one, but now needs to figure out which distribution he wants.
Edit: At n=1e6 we see the curvate of the Normal even for the truncated case:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With