So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple <code>std::rand()</code> plus modulus paradigm. However, I wanted to see the failings of <code>std::rand()</code> firsthand so I did a quick experiment: <ol> <li>Basically, I wrote 2 functions <code>getRandNum_Old()</code> and <code>getRandNum_New()</code> that generated a random number between 0 and 5 inclusive using <code>std::rand()</code> and <code>std::mt19937</code>+<code>std::uniform_int_distribution</code> respectively.</li> <li>Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.</li> <li>I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.</li> <li>Afterwards, I did the exact same again but this time generating random numbers the "new" way.</li> <li>Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.</li> </ol> Here were the results: <pre class="prettyprint"><code>[OLD WAY] Spread mean: 346.554406 std dev: 110.318361 Time Taken (ms) mean: 6.662910 std dev: 0.366301 [NEW WAY] Spread mean: 350.346792 std dev: 110.449190 Time Taken (ms) mean: 28.053907 std dev: 0.654964 </code></pre> Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., <code>std::mt19937</code>+<code>std::uniform_int_distribution</code> was not "more uniform" than simple <code>std::rand()</code>+<code>%</code>. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality. Is my experiment flawed in some way? Or is <code>std::rand()</code> really not that bad, and maybe even better? For reference, here is the code I used in its entirety: <pre class="prettyprint"><code>#include <cstdio> #include <random> #include <algorithm> #include <chrono> int getRandNum_Old() { static bool init = false; if (!init) { std::srand(time(nullptr)); // Seed std::rand init = true; } return std::rand() % 6; } int getRandNum_New() { static bool init = false; static std::random_device rd; static std::mt19937 eng; static std::uniform_int_distribution<int> dist(0,5); if (!init) { eng.seed(rd()); // Seed random engine init = true; } return dist(eng); } template <typename T> double mean(T* data, int n) { double m = 0; std::for_each(data, data+n, [&](T x){ m += x; }); m /= n; return m; } template <typename T> double stdDev(T* data, int n) { double m = mean(data, n); double sd = 0.0; std::for_each(data, data+n, [&](T x){ sd += ((x-m) * (x-m)); }); sd /= n; sd = sqrt(sd); return sd; } int main() { const int N = 960000; // Number of trials const int M = 1000; // Number of simulations const int D = 6; // Num sides on die /* Do the things the "old" way (blech) */ int freqList_Old[D]; double stdDevList_Old[M]; double timeTakenList_Old[M]; for (int j = 0; j < M; j++) { auto start = std::chrono::high_resolution_clock::now(); std::fill_n(freqList_Old, D, 0); for (int i = 0; i < N; i++) { int roll = getRandNum_Old(); freqList_Old[roll] += 1; } stdDevList_Old[j] = stdDev(freqList_Old, D); auto end = std::chrono::high_resolution_clock::now(); auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start); double timeTaken = dur.count() / 1000.0; timeTakenList_Old[j] = timeTaken; } /* Do the things the cool new way! */ int freqList_New[D]; double stdDevList_New[M]; double timeTakenList_New[M]; for (int j = 0; j < M; j++) { auto start = std::chrono::high_resolution_clock::now(); std::fill_n(freqList_New, D, 0); for (int i = 0; i < N; i++) { int roll = getRandNum_New(); freqList_New[roll] += 1; } stdDevList_New[j] = stdDev(freqList_New, D); auto end = std::chrono::high_resolution_clock::now(); auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start); double timeTaken = dur.count() / 1000.0; timeTakenList_New[j] = timeTaken; } /* Display Results */ printf("[OLD WAY]\n"); printf("Spread\n"); printf(" mean: %.6f\n", mean(stdDevList_Old, M)); printf(" std dev: %.6f\n", stdDev(stdDevList_Old, M)); printf("Time Taken (ms)\n"); printf(" mean: %.6f\n", mean(timeTakenList_Old, M)); printf(" std dev: %.6f\n", stdDev(timeTakenList_Old, M)); printf("\n"); printf("[NEW WAY]\n"); printf("Spread\n"); printf(" mean: %.6f\n", mean(stdDevList_New, M)); printf(" std dev: %.6f\n", stdDev(stdDevList_New, M)); printf("Time Taken (ms)\n"); printf(" mean: %.6f\n", mean(timeTakenList_New, M)); printf(" std dev: %.6f\n", stdDev(timeTakenList_New, M)); } </code></pre>

Pretty much any implementation of "old" <code>rand()</code> use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs. Common failings of "bad" - but common enough - <code>rand()</code> implementations are: <ul> <li>low randomness of low-order bits;</li> <li>short period;</li> <li>low <code>RAND_MAX</code>;</li> <li>some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).</li> </ul> Still, none of these are specific to the API of <code>rand()</code>. A particular implementation could place a xorshift-family generator behind <code>srand</code>/<code>rand</code> and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output. Edit: @R. correctly notes that the <code>rand</code>/<code>srand</code> interface is limited by the fact that <code>srand</code> takes an <code>unsigned int</code>, so any generator an implementation may put behind them is intrinsically limited to <code>UINT_MAX</code> possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make <code>srand</code> take an <code>unsigned long long</code>, or adding a separate <code>srand(unsigned char *, size_t)</code> overload. <hr> Indeed, the actual problem with <code>rand()</code> is not much of implementation in principle but: <ul> <li>backwards compatibility; many current implementations use suboptimal generators, typically with badly chosen parameters; a notorious example is Visual C++, which sports a <code>RAND_MAX</code> of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using <code>srand</code> with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);</li> <li> simplistic interface; <code>rand()</code> provides a single generator with the global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems: <ul> <li>with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread-local state; this last one has been adopted by several implementations (notably Visual C++);</li> <li>if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.</li> </ul> </li> </ul> Finally, the <code>rand</code> state of affairs: <ul> <li>doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;</li> <li>doesn't provide any cross-platform method to obtain a decent seed (<code>time(NULL)</code> is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).</li> </ul> Hence the new <code><random></code> header, which tries to fix this mess providing algorithms that are: <ul> <li>fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);</li> <li>generally of state-of-the-art quality (from when the library was designed; see below);</li> <li>encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);</li> </ul> ... and a default <code>random_device</code> as well to seed them. Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial <code>random.randint</code> & Co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible). <hr> Finally, to get back to your performance comparison: as others have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use <code>std::minstd_rand</code> instead of <code>std::mt19937</code>. Indeed, after tweaking your function to use <code>std::minstd_rand</code> and avoid useless static variables for initialization <pre class="prettyprint"><code>int getRandNum_New() { static std::minstd_rand eng{std::random_device{}()}; static std::uniform_int_distribution<int> dist{0, 5}; return dist(eng); } </code></pre> I get 9 ms (old) vs 21 ms (new); finally, if I get rid of <code>dist</code> (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in <code>getRandNum_Old()</code> <pre class="prettyprint"><code>int getRandNum_New() { static std::minstd_rand eng{std::random_device{}()}; return eng() % 6; } </code></pre> I get it down to 6 ms (so, 30% faster), probably because, unlike the call to <code>rand()</code>, <code>std::minstd_rand</code> is easier to inline. <hr> Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) <code>XorShift64*</code>, and it's 2.3 times faster than <code>rand()</code> (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.

If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than <code>RAND_MAX</code> there isn't an issue for most applications. For example if we have a <code>RAND_MAX</code> of 25 then <code>rand() % 5</code> will produce numbers with the following frequencies: <pre class="prettyprint"><code>0: 6 1: 5 2: 5 3: 5 4: 5 </code></pre> As <code>RAND_MAX</code> is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.

Why is the new random library better than std::rand()?

Tags:

c++

random

c++11

So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.

However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:

Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.
Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.
I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.
Afterwards, I did the exact same again but this time generating random numbers the "new" way.
Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:

[OLD WAY] Spread        mean:  346.554406     std dev:  110.318361 Time Taken (ms)        mean:  6.662910     std dev:  0.366301  [NEW WAY] Spread        mean:  350.346792     std dev:  110.449190 Time Taken (ms)        mean:  28.053907     std dev:  0.654964

Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.

Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?

For reference, here is the code I used in its entirety:

#include <cstdio> #include <random> #include <algorithm> #include <chrono>  int getRandNum_Old() {     static bool init = false;     if (!init) {         std::srand(time(nullptr)); // Seed std::rand         init = true;     }      return std::rand() % 6; }  int getRandNum_New() {     static bool init = false;     static std::random_device rd;     static std::mt19937 eng;     static std::uniform_int_distribution<int> dist(0,5);     if (!init) {         eng.seed(rd()); // Seed random engine         init = true;     }      return dist(eng); }  template <typename T> double mean(T* data, int n) {     double m = 0;     std::for_each(data, data+n, [&](T x){ m += x; });     m /= n;     return m; }  template <typename T> double stdDev(T* data, int n) {     double m = mean(data, n);     double sd = 0.0;     std::for_each(data, data+n, [&](T x){ sd += ((x-m) * (x-m)); });     sd /= n;     sd = sqrt(sd);     return sd; }  int main() {     const int N = 960000; // Number of trials     const int M = 1000;   // Number of simulations     const int D = 6;      // Num sides on die      /* Do the things the "old" way (blech) */      int freqList_Old[D];     double stdDevList_Old[M];     double timeTakenList_Old[M];      for (int j = 0; j < M; j++) {         auto start = std::chrono::high_resolution_clock::now();         std::fill_n(freqList_Old, D, 0);         for (int i = 0; i < N; i++) {             int roll = getRandNum_Old();             freqList_Old[roll] += 1;         }         stdDevList_Old[j] = stdDev(freqList_Old, D);         auto end = std::chrono::high_resolution_clock::now();         auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);         double timeTaken = dur.count() / 1000.0;         timeTakenList_Old[j] = timeTaken;     }      /* Do the things the cool new way! */      int freqList_New[D];     double stdDevList_New[M];     double timeTakenList_New[M];      for (int j = 0; j < M; j++) {         auto start = std::chrono::high_resolution_clock::now();         std::fill_n(freqList_New, D, 0);         for (int i = 0; i < N; i++) {             int roll = getRandNum_New();             freqList_New[roll] += 1;         }         stdDevList_New[j] = stdDev(freqList_New, D);         auto end = std::chrono::high_resolution_clock::now();         auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);         double timeTaken = dur.count() / 1000.0;         timeTakenList_New[j] = timeTaken;     }      /* Display Results */      printf("[OLD WAY]\n");     printf("Spread\n");     printf("       mean:  %.6f\n", mean(stdDevList_Old, M));     printf("    std dev:  %.6f\n", stdDev(stdDevList_Old, M));     printf("Time Taken (ms)\n");     printf("       mean:  %.6f\n", mean(timeTakenList_Old, M));     printf("    std dev:  %.6f\n", stdDev(timeTakenList_Old, M));     printf("\n");     printf("[NEW WAY]\n");     printf("Spread\n");     printf("       mean:  %.6f\n", mean(stdDevList_New, M));     printf("    std dev:  %.6f\n", stdDev(stdDevList_New, M));     printf("Time Taken (ms)\n");     printf("       mean:  %.6f\n", mean(timeTakenList_New, M));     printf("    std dev:  %.6f\n", stdDev(timeTakenList_New, M)); }

489

asked Oct 29 '18 07:10

rcplusplus

2 Answers

Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.

Common failings of "bad" - but common enough - rand() implementations are:

low randomness of low-order bits;
short period;
low RAND_MAX;
some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.

Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.

Indeed, the actual problem with rand() is not much of implementation in principle but:

backwards compatibility; many current implementations use suboptimal generators, typically with badly chosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);
simplistic interface; rand() provides a single generator with the global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:
- with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread-local state; this last one has been adopted by several implementations (notably Visual C++);
- if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.

Finally, the rand state of affairs:

doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;
doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:

fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);
generally of state-of-the-art quality (from when the library was designed; see below);
encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.

Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & Co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).

Finally, to get back to your performance comparison: as others have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.

Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization

int getRandNum_New() {     static std::minstd_rand eng{std::random_device{}()};     static std::uniform_int_distribution<int> dist{0, 5};     return dist(eng); }

I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()

int getRandNum_New() {     static std::minstd_rand eng{std::random_device{}()};     return eng() % 6; }

I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.

Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.

145

answered Sep 19 '22 12:09

Matteo Italia

If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.

For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:

0: 6 1: 5 2: 5 3: 5 4: 5

As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.

answered Sep 21 '22 12:09

Alan Birtles

Related questions
                            
                                Allow for Range-Based For with enum classes?
                            
                                How to set up Google C++ Testing Framework (gtest) with Visual Studio 2005
                            
                                What should be the sizeof(int) on a 64-bit machine? [duplicate]
                            
                                Is it ever OK to *not* use free() on allocated memory?
                            
                                How to change the Title of the window in Qt?
                            
                                Using C++ filestreams (fstream), how can you determine the size of a file? [duplicate]
                            
                                How to use Libraries
                            
                                Initializing const member within class declaration in C++
                            
                                Is it safe to rename argc and argv in main function?
                            
                                Convert std::string to QString
                            
                                performance of unsigned vs signed integers
                            
                                How to make a for loop variable const with the exception of the increment statement?
                            
                                Is there an elegant and fast way to test for the 1-bits in an integer to be in a contiguous region?
                            
                                How can I pass a class member function as a callback?
                            
                                Ternary operator ?: vs if...else
                            
                                Why does this invalid-looking code compile successfully on g++ 6.0? [duplicate]
                            
                                Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size
                            
                                How do I deal with "signed/unsigned mismatch" warnings (C4018)?
                            
                                Is there a LINQ library for C++? [closed]
                            
                                How to compare two vectors for equality element by element in C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With