Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which of <random>'s random number engines should one actually use in practice? std::mt19937?

Tags:

Suppose you want to use C++ <random> facilities in a practical program (for some definition of "practical" — the constraints here are kind of part of this question). You've got code roughly like this:

int main(int argc, char **argv) {     int seed = get_user_provided_seed_value(argc, argv);     if (seed == 0) seed = std::random_device()();     ENGINE g(seed);  // TODO: proper seeding?     go_on_and_use(g); } 

My question is, what type should you use for ENGINE?

  • I used to always say std::mt19937 because it was quick to type and had name recognition. But these days it seems like everyone's saying that the Mersenne Twister is very heavyweight and cache-unfriendly and doesn't even pass all the statistical tests that others do.

  • I'd like to say std::default_random_engine because it's the obvious "default." But I don't know if it varies from platform to platform, and I don't know if it's statistically any good.

  • Since everyone's on a 64-bit platform these days, should we at least be using std::mt19937_64 over std::mt19937?

  • I'd like to say pcg64 or xoroshiro128 because they seem well-respected and lightweight, but they don't exist in <random> at all.

  • I don't know anything about minstd_rand, minstd_rand0, ranlux24, knuth_b, etc. — surely they must be good for something?

Obviously there are some competing constraints here.

  • Strength of the engine. (<random> has no cryptographically strong PRNGs, but still, some of the standardized ones are "weaker" than others, right?)

  • sizeof the engine.

  • Speed of its operator().

  • Ease of seeding. mt19937 is notoriously hard to seed properly because it has so much state to initialize.

  • Portability between library vendors. If one vendor's foo_engine produces different numbers from another vendor's foo_engine, that's not good for some applications. (Hopefully this rules out nothing except maybe default_random_engine.)

Weighing all these constraints as best you can, what would you say is the ultimate "best-practice staying-within-the-standard-library" answer? Should I just keep using std::mt19937, or what?

like image 736
Quuxplusone Avatar asked Feb 09 '20 05:02

Quuxplusone


People also ask

What is std :: mt19937?

std::mt19937(since C++11) class is a very efficient pseudo-random number generator and is defined in a random header file. It produces 32-bit pseudo-random numbers using the well-known and popular algorithm named Mersenne twister algorithm.

What is default random engine C++?

Default random engine. This is a random number engine class that generates pseudo-random numbers. It is the library implemention's selection of a generator that provides at least acceptable engine behavior for relatively casual, inexpert, and/or lightweight use.

How do you generate a random number in C++?

One way to generate these numbers in C++ is to use the function rand(). Rand is defined as: #include <cstdlib> int rand(); The rand function takes no arguments and returns an integer that is a pseudo-random number between 0 and RAND_MAX.


2 Answers

C++ Reference lists all the random engines currently provided by C++. However, the selection of engines leaves a lot to be desired (e.g., see my list of high-quality random generators). For instance:

  • default_random_engine is implementation-defined, so it's unknown whether the engine has statistical flaws that the application may care about.
  • linear_congruential_engine implements linear congruential generators. However, they tend to have poor quality unless the modulus is prime and very big (at least 64 bits). Also, they can't admit more seeds than their modulus.
  • minstd_rand0 and minstd_rand admit only about 2^31 seeds. knuth_b wraps a minstd_rand0 and does a Bays–Durham shuffle of it.
  • mt19937 and mt19937_64 could admit a lot more seeds if they were better initialized (e.g., by initializing a std::seed_seq with multiple outputs of random_device, not just one), but they use about 2500 bytes of state.
  • ranlux24 and ranlux48 use about 577 bits of state but they are slow (they work by keeping some and discarding other pseudorandom outputs).

However, C++ also has two engines that wrap another engine to potentially improve its randomness properties:

  • discard_block_engine discards some of the outputs of a given random engine.
  • shuffle_order_engine implements a Bays–Durham shuffle of a given random engine.

For instance, it's possible, say, to have a Bays–Durham shuffle of mt19937, ranlux24, or a custom linear_congruential_engine with shuffle_order_engine. Perhaps the wrapped engine is better quality than the original one. However, it's hard to predict the new engine's statistical quality without testing it.

Thus, pending such tests, it seems that mt19937 is the most practical engine in the C++ standard for now. I am aware, however, of at least one proposal to add another random number engine to future versions of C++ (see C++ paper P2075).

like image 162
Peter O. Avatar answered Sep 30 '22 05:09

Peter O.


According to C++ Reference, default_random_engine:

Is the library implemention's selection of a generator that provides at least acceptable engine behavior for relatively casual, inexpert, and/or lightweight use.

So for lightweight use you don't need to be worry about anything, seed default_random_engine with Epoch Time (time(0)) and that would be fine enough ;)

like image 32
Farbod Ahmadian Avatar answered Sep 30 '22 03:09

Farbod Ahmadian