In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<>
does not return a random value, but rather the distribution's min value. That is, given the code:
default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );
...x
will in fact be smaller
for any values of any_seed()
, smaller
, or larger
.
To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.
I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?
A very useful tip to make C++ generate random numbers is to use the time() method. By seeding the generator with the same number, you are more likely to get the same random number each time. Therefore, if you use the C++ srand() with the current time, the generated random number will always be different.
The length, before a pseudorandom sequence repeats, is called "the period". The period is strictly limited by the length of the initial seed. For example, if we use a two-digit seed, then an algorithm can produce, at most, 100 numbers, before reusing a seed and repeating the cycle.
This is how uniform_int_distribution
maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
__ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;
where __urange
is larger - smaller
and __urngrange
is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)
In our case, the rng default_random_engine
is a minstd_rand0
, which yields __scaling == 195225785
for the range [0,10] you tested with. Thus, if rng() < 195225785
, the distribution will return 0.
The first number a minstd_rand0
returns is
(16807 * seed) % 2147483647
(where seed == 0
gets adjusted to 1
btw). We can thus see that the first value produced by a minstd_rand0
seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 );
you used. (mod off-by-one-errors on my part. ;) )
You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.
No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0
will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)
Case 1: You want random number of high statistical quality.
For that, you use a better rng like mt19937
and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)
Case 2: You really want to use those small seeds only.
We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:
std::mt19937 rng(smallSeed);
rng.discard(700000);
It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0
, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937
are superior by far.
Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0
.
You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.
Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With