Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

General purpose random number generation

Tags:

c++

random

c++11

C++11 introduced a vastly superior random number library to C's rand(). In C, you often see the following code:

srand(time(0));
rand() % MAX + MIN;

Because time(0) returns the current time in seconds, rapid successive calls to the program will produce the same sequence of numbers. The quick fix to this is to provide a seed in nanoseconds:

 struct timeval time; 
 gettimeofday(&time,NULL);
 srand((time.tv_sec * 1000) + (time.tv_usec / 1000));

Of course this doesn't change the fact that rand() is universally seen as bad and superior alternatives are either non-portable (like Linux's random()) or rely on a third party library (like Boost).

In C++11, the shortest program I'm aware of to produce good random numbers is:

#include <iostream>
#include <random>

int main()
{
    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_int_distribution<int> dist(1, 10);
    std::cout << dist(mt);
}

std::random_device is non-portable and std::default_random_engine is discouraged because it may choose a poor engine, such as std::rand. In fact, std::random_shuffle is deprecated and std::shuffle is preferred for this reason. Generally, I see people saying to use chrono to provide a seed instead:

std::chrono::high_resolution_clock::now().time_since_epoch().count()

This is not only hard to remember, but looks even uglier when we want to use nanoseconds instead:

using namespace std::chrono;
std::mt19937 mt(duration_cast<nanoseconds>(high_resolution_clock::now()
                                      .time_since_epoch()).count());
  • The C approach looks desirable because it doesn't require as much boilerplate.

  • random_device is easiest because it doesn't require a ugly one-liner even though it is non-portable.

  • mt19937 is harder to remember than default_random_engine.

Which is the best approach?

like image 970
user4155618 Avatar asked Nov 11 '22 00:11

user4155618


1 Answers

(1) know the available generators and pick the one most suitable for the job

(2) cook the seed entropy, draw a standard measure (like 256 bits), print it to the log

(3) turn your standard seed block into a seed_seq of a size appropriate for the generator in question and seed the genny

Regarding (1): the generators in the standard library are a bit tricky to use because they all have some peculiarities, and they all fail standard PRNG tests like TestU01 systematically. You have to know their particular defects in order to judge their applicability. Failing that, take mt19937 or ranlux, seed them well and hope for the best. Using a typedef - your own - allows you to switch and experiment with different gennies. typeid(rng_t).name() sees through the disguise and logs the actual name.

Regarding (2): you cannot pass raw, lumpy entropy to the seeding procedures; if you do then small seed differences will result only in small differences of state. The entropy has to be cooked into a nice smooth mush where every bit depends with 50% probability on every bit of the original input. This includes inputs like 1, 2, 3, ... Taking a fixed standard amount of the bit soup makes the whole thing manageable, like printing to screen or log to ensure repeatability if necessary. Needless to say, if you use seed numbers like 1, 2, 42, ... instead of random seeds then you can print those to the log, not the bit soup extract. Using your own bit grinder means your are not at the mercy of half-assed seeding functions, and even 'deficient' seeds like 1, 2, 3 and so on give you wildly different generator states (sequences).

Regarding (3): some generators - like mt19937 - have huge internal state and so you need to stretch your 256-bit (or whatever) standard seed quite a lot. Unfortunately, the standard library does not contain any generators that are well suited for this task, and there are no adapters for turning a generator into a seed_seq either.

I'd use an xorshift*, a KISS, a ran (Numerical Recipes), or a 4x32 Tausworthe (a.k.a. lfsr113) but none of those are in the library. The library doesn't have any suitable mixing functions (bit grinders) either.

I posted the code for the murmur mixer - a simple and extremely efficient bit mixing function - in a similar topic; I'm giving the classic KISS and a Tausworthe here since I couldn't find suitable, clean references on the net.

struct KISS {  uint32_t a, b, c, d; ... };

uint32_t KISS::cycle ()
{
   a = (a & 0xFFFF) * 36969 + (a >> 16);         // 16-bit MWC, a.k.a. znew()
   b = (b & 0xFFFF) * 18000 + (b >> 16);         // 16-bit MWC, a.k.a. wnew()
   c = c * 69069 + 1234567;                      // 32-bit LCG, a.k.a. CONG()(
   d ^= d << 13;  d ^= d >> 17;  d ^= d << 5;    // 32-bit XorShift a.k.a. SHR3(), corrected

   return (((a << 16) | (b & 0xFFFF)) ^ c) + d;  // mixing function (combiner)
}

The combined Tausworthe:

struct LFSR113 {  uint32_t a, b, c, d; ... };

uint32_t LFSR113::cycle ()
{
   a = ((a ^ (a <<  6)) >> 13) ^ ((a & ~0x01) << 18);  // 31 bits
   b = ((b ^ (b <<  2)) >> 27) ^ ((b & ~0x07) <<  2);  // 29 bits
   c = ((c ^ (c << 13)) >> 21) ^ ((c & ~0x0F) <<  7);  // 28 bits
   d = ((d ^ (d <<  3)) >> 12) ^ ((d & ~0x7F) << 13);  // 25 bits

   return a ^ b ^ c ^ d;
}

For use as primary generators you'd have to adjust forbidden seeds (sticky states) but for seed stretching (making a seed_seq) this can be safely ignored. There are plenty of alternatives, like using a std::vector and one of the simple generators (LCG) to make a decent seed_seq, but I prefer tried, trusted & thoroughly analysed solutions with maximum bang for the least amount of code.

The two 4x32 generators shown here can be stepped using the Chinese Remainder Theorem, and conversely, any state can be mapped to its unique point in the overall sequence (glossing over things like orbits for the moment). This makes them and other, similar generators attractive as primary generators for general use whenever big guns like xorshift1024* (or mt19937) are not necessary.

In any case you'll need quite a bit of code - e.g. templates in a header file - in order to make the standard <random> generators easy, comfortable and safe to use. But it is 100% worth the effort. The generators aren't too hot but they are serviceable; the rest of the infrastructure is quite decent and it can go a long way towards solving your problems.

P.S.: some implementations (VC++) allow it to pass any generator to the seed() functions, which makes things trivially easy. Others - gcc - don't, meaning you have to do the seed_seq thing if you want your code to be portable. If you want things super-easy, simply pass your chosen seeds through murmur_mix() before handing them to seed() and move on.

The wages of fear: once you have stuffed your magic into a header, the actual application is easy.

#include "zrbj/rng_wrapper.hpp"
#include <random>
#include <typeinfo>

int main ()
{
   zrbj::seeded<std::mt19937> rng(42);

   std::cout << typeid(rng.wrapped_rng).name() << " -> " << rng();
}       

This prints the generator, the 42 and the actual seed to the log, apart from smashing bits to smithereens and stuffing them into the mt19937. Code once, lean back, enjoy.

like image 119
DarthGizka Avatar answered Nov 15 '22 08:11

DarthGizka