Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a C++11 random distribution made of?

Tags:

c++

random

c++11

I am trying to implement the following class:

typedef std::mt19937 Engine;

class Interval 
{
public:
    double upperBoundary;
    double lowerBoundary;
    double generateUniformRandomNumber(Engine& engine);
};

I want the class to work in a multithread environment. Each thread will have its own Engine object instance and it will pass the Engine to objects of any class that has randomized behaviour.

In order to generate random numbers uniformly the C++11 way, the implementation of the generateUniformRandomNumber would have to be something like this:

uniform_real_distribution<double> distribution_; // private member of Interval

double Interval::generateUniformRandomNumber(Engine& engine)
{
    return distribution_(engine); 
}

The problem is that I do not understand C++11 distributions. I know that C++11 random number engines can be very large objects (few kilobytes), but what about distributions? At first I thought that distributions are just simple functors, where operator() is a pure const function, but it seems that it is neither pure nor const. According to the reference, each distribution instance has a reset() member function. That means it has a potentionally big internal state or maybe a cache.

My question is:

  1. Do distributions have an internal state? If yes why? Does the standard say anything about the size of this state?

  2. Is it a good idea to make an implementation as I did? Is there a better way?

like image 221
Martin Drozdik Avatar asked Apr 15 '13 14:04

Martin Drozdik


4 Answers

A distribution may very well and usually will have some state. The standard doesn't give any restrictions here. I can think of a few reasons that state might be used:

  1. A random number distribution very often has some parameters to configure it. For example, a normal distribution has mean and variance parameters. Those are part of its state because they must be held on to between invocations.

  2. The distribution is implemented in terms of some other distribution. You can really just think of this as a more complex parameter. For example, my beta distribution implementation keeps hold of two gamma distributions which each have their own configuration.

  3. The distribution could change over time. There's nothing to say that repeat invocations of a distribution need be independent. This is where the reset member function comes in. Most distributions have independent calls to operator() and so the reset function actually does nothing (its definition is empty). However, if your calls are dependent, reset should bring the distribution back to a state where the next call is independent.

Your implementation seems fine. A uniform_real_distribution<double> is very unlikely to have much more state that the parameters you construct it with.

like image 106
Joseph Mansfield Avatar answered Sep 20 '22 23:09

Joseph Mansfield


Look to the docs for the RandomNumberDistribution template policy...

reset():

Resets the internal state of the distribution object. After a call to this function, the next call to operator() on the distribution object will not be dependent on previous calls to operator().

This implies that calls to operator() can change state that effects subsequent calls to operator(). That is both why reset() exists and why operator() isn't const.

uniform_real_distribution should be a small simple functor, like you said. It would likely just hold the 2 Reals it was constructed with and nothing else. And reset() should do nothing for that one.

like image 30
David Avatar answered Sep 20 '22 23:09

David


Yes, distributions can have internal states. They will typically be small, but if you're concerned about the size, check it.

like image 24
Pete Becker Avatar answered Sep 17 '22 23:09

Pete Becker


The specification for reset() states:

Subsequent uses of d do not depend on values produced by any engine prior to invoking reset.

That implies that normally invocations of operator() may depend on values produced by engines used in prior invocation of operator(). That is, a distribution can cache or otherwise depend on previous engine results.

For example a Bernoulli distribution might only need one bit to produce a result while a given engine provides 32 bits at a time, and so the distribution might cache the 32 bits and not invoke any engine again until 32 values have been generated.

Another example is a distribution (I forget which) where the common algorithm naturally produces two values at a time, so the distribution might save the second value for the next call.

So yes, distributions can have internal state. The standard does not place requirements on the size of this state.

If you're asking if it would be okay to share a distribution between threads then no, that's not a good idea. For one thing doing so is a data-race and results in undefined behavior unless you either add synchronization or make the distribution const (which, even if you can do that with your implementation of the standard library, is not portable). Secondly, the standard only makes guarantees when you use the distributions and engines in a particular way and sharing a distribution between several engines is not the way. It's unlikely that sharing a distribution will actually produce bad data but IMO it's still a good idea not to do that. Instead you might have each thread keep both its own engine and its own distribution.

like image 41
bames53 Avatar answered Sep 20 '22 23:09

bames53