Today I had a talk with a friend of mine told me he tries to make some monte carlo simulations using GPU. What was interesting he told me that he wanted to draw numbers randomly on different processors and assumed that there were uncorrelated. But they were not.
The question is, whether there exists a method to draw independent sets of numbers on several GPUs? He thought that taking a different seed for each of them would solve the problem, but it does not.
If any clarifications are need please let me know, I will ask him to provide more details.
To generate completely independent random numbers, you need to use a parallel random number generator. Essentially, you choose a single seed and it generates M independent random number streams. So on each of the M GPUs you could then generate random numbers from independent streams.
When dealing with multiple GPUs you need to be aware that you want:
It turns out that generating random numbers on each GPU core is tricky (see this question I asked a while back). When I've been playing about with GPUs and RNs, you only get a speed-up generating random on the GPU if you generate large numbers at once.
Instead, I would generate random numbers on the CPU, since:
To answer your question in the comments: What do random numbers depend on?
A very basic random number generator is the linear congruential generator. Although this generator has been surpassed by newer methods, it should give you an idea of how they work. Basically, the ith random number depends on the (i-1) random number. As you point out, if you run two streams long enough, they will overlap. The big problem is, you don't know when they will overlap.
For generating iid uniform variables, you just have to initialize your generators with differents seeds. With Cuda, you may use the NVIDIA Curand Library which implements the Mersenne Twister generator.
For example, the following code executed by 100 kernels in parallel, will draw 10 sample of a (R^10)-uniform
__global__ void setup_kernel(curandState *state,int pseed)
{
int id = blockIdx.x * blockDim.x + threadIdx.x;
int seed = id%10+pseed;
/* 10 differents seed for uncorrelated rv,
a different sequence number, no offset */
curand_init(seed, id, 0, &state[id]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With