Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to succinctly, portably, and thoroughly seed the mt19937 PRNG?

Tags:

c++

random

c++11

I would argue the greatest flaw with std::random_device is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG using std::random_device, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.

That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as sysrandom below) to seed the PRNG.

Windows


You can rely on CryptGenRandom, a CSPRNG. For example, you may use the following code:

bool acquire_context(HCRYPTPROV *ctx)
{
    if (!CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, 0)) {
        return CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, CRYPT_NEWKEYSET);
    }
    return true;
}


size_t sysrandom(void* dst, size_t dstlen)
{
    HCRYPTPROV ctx;
    if (!acquire_context(&ctx)) {
        throw std::runtime_error("Unable to initialize Win32 crypt library.");
    }

    BYTE* buffer = reinterpret_cast<BYTE*>(dst);
    if(!CryptGenRandom(ctx, dstlen, buffer)) {
        throw std::runtime_error("Unable to generate random bytes.");
    }

    if (!CryptReleaseContext(ctx, 0)) {
        throw std::runtime_error("Unable to release Win32 crypt library.");
    }

    return dstlen;
}

Unix-Like


On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).

size_t sysrandom(void* dst, size_t dstlen)
{
    char* buffer = reinterpret_cast<char*>(dst);
    std::ifstream stream("/dev/urandom", std::ios_base::binary | std::ios_base::in);
    stream.read(buffer, dstlen);

    return dstlen;
}

Other


If no CSPRNG is available, you might choose to rely on std::random_device. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).

Seeding


Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).

std::uint_least32_t seed;    
sysrandom(&seed, sizeof(seed));
std::mt19937 gen(seed);

Comparison To Boost


We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses MS_DEF_PROV on Windows, which is the provider type for PROV_RSA_FULL. The only thing missing would be verifying the cryptographic context, which can be done with CRYPT_VERIFYCONTEXT. On *Nix, Boost uses /dev/urandom. IE, this solution is portable, well-tested, and easy-to-use.

Linux Specialization


If you're willing to sacrifice succinctness for security, getrandom is an excellent choice on Linux 3.17 and above, and on recent Solaris. getrandom behaves identically to /dev/urandom, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linux getrandom is available, and if not falls back to /dev/urandom.

#if defined(__linux__) || defined(linux) || defined(__linux)
#   // Check the kernel version. `getrandom` is only Linux 3.17 and above.
#   include <linux/version.h>
#   if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
#       define HAVE_GETRANDOM
#   endif
#endif

// also requires glibc 2.25 for the libc wrapper
#if defined(HAVE_GETRANDOM)
#   include <sys/syscall.h>
#   include <linux/random.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = syscall(SYS_getrandom, dst, dstlen, 0);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#elif defined(_WIN32)

// Windows sysrandom here.

#else

// POSIX sysrandom here.

#endif

OpenBSD


There is one final caveat: modern OpenBSD does not have /dev/urandom. You should use getentropy instead.

#if defined(__OpenBSD__)
#   define HAVE_GETENTROPY
#endif

#if defined(HAVE_GETENTROPY)
#   include <unistd.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = getentropy(dst, dstlen);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#endif

Other Thoughts


If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both basic_filebuf and FILE contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).

This could easily be done by changing sysrandom to:

size_t sysrandom(void* dst, size_t dstlen)
{
    int fd = open("/dev/urandom", O_RDONLY);
    if (fd == -1) {
        throw std::runtime_error("Unable to open /dev/urandom.");
    }
    if (read(fd, dst, dstlen) != dstlen) {
        close(fd);
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    close(fd);
    return dstlen;
}

Thanks


Special thanks to Ben Voigt for pointing out FILE uses buffered reads, and therefore should not be used.

I would also like to thank Peter Cordes for mentioning getrandom, and OpenBSD's lack of /dev/urandom.


In a sense, this can't be done portably. That is, one can conceive a valid fully-deterministic platform running C++ (say, a simulator which steps the machine clock deterministically, and with "determinized" I/O) in which there is no source of randomness to seed a PRNG.


You can use a std::seed_seq and fill it to at least the requires state size for the generator using Alexander Huszagh's method of getting the entropy:

size_t sysrandom(void* dst, size_t dstlen); //from Alexander Huszagh answer above

void foo(){

    std::array<std::mt19937::UIntType, std::mt19937::state_size> state;
    sysrandom(state.begin(), state.length*sizeof(std::mt19937::UIntType));
    std::seed_seq s(state.begin(), state.end());

    std::mt19937 g;
    g.seed(s);
}

If there was a proper way to fill or create a SeedSequence from a UniformRandomBitGenerator in the standard library using std::random_device for seeding properly would be much simpler.


The implementation I am working on takes advantage of the state_size property of the mt19937 PRNG to decide how many seeds to provide on initialization:

using Generator = std::mt19937;

inline
auto const& random_data()
{
    thread_local static std::array<typename Generator::result_type, Generator::state_size> data;
    thread_local static std::random_device rd;

    std::generate(std::begin(data), std::end(data), std::ref(rd));

    return data;
}

inline
Generator& random_generator()
{
    auto const& data = random_data();

    thread_local static std::seed_seq seeds(std::begin(data), std::end(data));
    thread_local static Generator gen{seeds};

    return gen;
}

template<typename Number>
Number random_number(Number from, Number to)
{
    using Distribution = typename std::conditional
    <
        std::is_integral<Number>::value,
        std::uniform_int_distribution<Number>,
        std::uniform_real_distribution<Number>
    >::type;

    thread_local static Distribution dist;

    return dist(random_generator(), typename Distribution::param_type{from, to});
}

I think there is room for improvement because std::random_device::result_type could differ from std::mt19937::result_type in size and range so that should really be taken into account.

A note about std::random_device.

According to the C++11(/14/17) standard(s):

26.5.6 Class random_device [ rand.device ]

2 If implementation limitations prevent generating non-deterministic random numbers, the implementation may employ a random number engine.

This means the implementation may only generate deterministic values if it is prevented from generating non-deterministic ones by some limitation.

The MinGW compiler on Windows famously does not provide non-deterministic values from its std::random_device, despite them being easily available from the Operating System. So I consider this a bug and not likely a common occurrence across implementations and platforms.


There's nothing wrong with seeding by using time, assuming you don't need it to be secure (and you didn't say this was necessary). The insight is that you can use hashing to fix the non-randomness. I've found this works adequately in all cases, including and in-particular for heavy Monte Carlo simulations.

One nice feature of this approach is that it generalizes to initialization from other not-really-random sets of seeds. For example, if you want each thread to have its own RNG (for threadsafety), you can just initialize based on hashed thread ID.

The following is a SSCCE, distilled from my codebase (for simplicity; some OO support structures elided):

#include <cstdint> //`uint32_t`
#include <functional> //`std::hash`
#include <random> //`std::mt19937`
#include <iostream> //`std::cout`

static std::mt19937 rng;

static void seed(uint32_t seed) {
    rng.seed(static_cast<std::mt19937::result_type>(seed));
}
static void seed() {
    uint32_t t = static_cast<uint32_t>( time(nullptr) );
    std::hash<uint32_t> hasher; size_t hashed=hasher(t);
    seed( static_cast<uint32_t>(hashed) );
}

int main(int /*argc*/, char* /*argv*/[]) {
    seed();
    std::uniform_int_distribution<> dis(0, 5);
    std::cout << dis(rng);
}