Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does linear_congruential_engine::seed(Sseq) discard three numbers generated by the seed sequence?

Tags:

c++

The C++ standard (all the way from C++11 to the current C++17 draft) says the following in [rand.eng.lcong]:

template<class Sseq> explicit linear_congruential_engine(Sseq& q);

Effects: Constructs a linear_congruential_engine object. With k = ⌈log2(m) ÷ 32⌉ and a an array 32 (or equivalent) of length k + 3, invokes q.generate(a + 0, a + k + 3) and then computes S = (∑j=0k−1aj+3 · 232​j) mod m. If c mod m is 0 and S is 0, sets the engine’s state to 1, else sets the engine’s state to S.

Why are a0, a1 and a2 discarded?

like image 988
Chortos-2 Avatar asked Jun 20 '16 16:06

Chortos-2


1 Answers

This is something I've been trying to figure out myself. I had a hypothesis, but no real evidence for it. However, the document the rule originated in (N2079 as linked by @T.C.) confirmed part of my theory.

Note that the function that the rule comes from takes in a std::seed_seq object in the document, and not a templated class. This means that when the rule was written, it was made specifically for std::seed_seq, and not the concept of a SeedSequence in general. This means that we can look at the std::seed_seq class for information about this, specifically with how std::seed_seq::generate is defined.

The method that std::seed_seq::generate uses is well explained on cppreference.com. It's a bit complicated, but it can be summarized in 4 stages.

  1. Initialize the output range with some initial data (I'm including k=0 here)

  2. Move the original seed data into the output range (k=1..s)

  3. Extend the seed data into the rest of the output range (k=s+1..m-1 where m=max(s+1, n))

  4. Shuffle the data in the output range (k=m..m+n-1)

When seeding a std::linear_congruential_engine with the modulo <= 232 (which includes std::minstd_rand and std::minstd_rand0), it should only need to generate 1 value from std::seed_seq, but with this rule, it generates 4. So what changes in this algorithm when changing n from 1 to 4?

One part that changes is that the shuffling stage goes from 1 iteration to 4. Since one of the goals of std::seed_seq is to produce high quality seeding values "given a small seed or a poorly distributed initial seed sequence.`, these extra shuffling iterations likely improve the resulting seed value. The reason why it drops the first 3 values instead of the last is because the later values are (usually) the ones that get shuffled more.

It's also worth noting that a key equation of all 4 stages is the value begin[k]^begin[k+p]^begin[k−1] (with XOR replaced with addition in the last stage). When n=1, this simply becomes begin[k] (or 3*begin[k] for the last stage) (note that "the indexing of the output range begin[x] is taken modulo n" and x % 1 == 0). When n=4, this equation works more like it was intended to, which helps shuffle the data around more effectively.

So the short answer is that std::linear_congruential_engine discards 3 numbers generated by the seed sequence because having std::seed_seq generate those numbers improves the quality of the values it actually uses. Now, the decision to discard these numbers in the generator was decided before the generic concept of SeedSequence was defined, so it made more sense to solve the issue in the generator, instead of overcomplicating the seed sequence class. However, this now means that the first 3 values generated by any seed sequence are discarded. Whether or not that's worth it is probably debatable, but it is the way it is now.

like image 184
LRFLEW Avatar answered Oct 15 '22 13:10

LRFLEW