Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a "random" number in OpenCL

Tags:

opencl

I was solving this "no random" issue for last few days and I came up with three different approaches:

  1. Xorshift - I created generator based on this one. All you have to do is provide one uint2 number (seed) for whole kernel and every work item will compute his own rand number

    // 'randoms' is uint2 passed to kernel
    uint seed = randoms.x + globalID;
    uint t = seed ^ (seed << 11);  
    uint result = randoms.y ^ (randoms.y >> 19) ^ (t ^ (t >> 8));
    
  2. Java random - I used code from .next(int bits) method to generate random number. This time you have to provide one ulong number as seed.

    // 'randoms' is ulong passed to kernel
    ulong seed = randoms + globalID;
    seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
    uint result = seed >> 16;
    
  3. Just generate all on CPU and pass it to kernel in one big buffer.

I tested all three approaches (generators) in my evolution algorithm computing Minimum Dominating Set in graphs.

I like the generated numbers from the first one, but it looks like my evolution algorithm doesn't.

Second generator generates numbers that has some visible pattern but my evolution algorithm likes it that way anyway and whole thing run little faster than with the first generator.

But the third approach shows that it's absolutely fine to just provide all numbers from host (cpu). First I though that generating (in my case) 1536 int32 numbers and passing them to GPU in every kernel call would be too expensive (to compute and transfer to GPU). But it turns out, it is as fast as my previous attempts. And CPU load stays under 5%.

BTW, I also tried MWC64X Random but after I installed new GPU driver the function mul_hi starts causing build fail (even whole AMD Kernel Analyer crashed).


the following is the algorithm used by the java.util.Random class according to the doc:

(seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1)

See the documentation for its various implementations. Passing the worker's id in for the seed and looping a few time should produce decent randomness

or another metod would be to have some random operations occur that are fairly ceratain to overflow:

 long rand= yid*xid*as_float(xid-yid*xid);
 rand*=rand<<32^rand<<16|rand;
 rand*=rand+as_double(rand);

with xid=get_global_id(0); and yid= get_global_id(1);


I am currently implementing a Realtime Path Tracer. You might already know that Path Tracing requires many many random numbers.
Before generating random numbers on the GPU I simply generated them on the CPU (using rand(), which sucks) and passed them to the GPU.
That quickly became a bottleneck.
Now I am generating the random numbers on the GPU with the Park-Miller Pseudorandom Number Generator (PRNG).
It is extremely simple to implement and achieves very good results.
I took thousands of samples (in the range of 0.0 to 1.0) and averaged them together.
The resulting value was very close to 0.5 (which is what you would expect). Between different runs the divergence from 0.5 was around 0.002. Therefore it has a very uniform distribution.

Here's a paper describing the algorithm:
http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/ufeen8-15-m/p1192-parkmiller.pdf
And here's a paper about the above algorithm optimized for CUDA (which can easily be ported to OpenCL): http://www0.cs.ucl.ac.uk/staff/ucacbbl/ftp/papers/langdon_2009_CIGPU.pdf

Here's an example of how I'm using it:

int rand(int* seed) // 1 <= *seed < m
{
    int const a = 16807; //ie 7**5
    int const m = 2147483647; //ie 2**31-1

    *seed = (long(*seed * a))%m;
    return(*seed);
}

kernel random_number_kernel(global int* seed_memory)
{
    int global_id = get_global_id(1) * get_global_size(0) + get_global_id(0); // Get the global id in 1D.

    // Since the Park-Miller PRNG generates a SEQUENCE of random numbers
    // we have to keep track of the previous random number, because the next
    // random number will be generated using the previous one.
    int seed = seed_memory[global_id];

    int random_number = rand(&seed); // Generate the next random number in the sequence.

    seed_memory[global_id] = *seed; // Save the seed for the next time this kernel gets enqueued.
}

The code serves just as an example. I have not tested it.
The array "seed_memory" is being filled with rand() only once before the first execution of the kernel. After that, all random number generation is happening on the GPU. I think it's also possible to simply use the kernel id instead of initializing the array with rand().


It seems OpenCL does not provide such functionality. However, some people have done some research on that and provide BSD licensed code for producing good random numbers on GPU.


This is my version of OpenCL float pseudorandom noise, using trigonometric function

//noise values in range if 0.0 to 1.0
static float noise3D(float x, float y, float z) {
    float ptr = 0.0f;
    return fract(sin(x*112.9898f + y*179.233f + z*237.212f) * 43758.5453f, &ptr);
}

__kernel void fillRandom(float seed, __global float* buffer, int length) {
    int gi = get_global_id(0);
    float fgi = float(gi)/length;
    buffer[gi] = noise3D(fgi, 0.0f, seed);
}

You can generate 1D or 2D noize by passing to noise3D normalized index coordinates as a first parameters, and the random seed (generated on CPU for example) as a last parameter.

Here are some noise pictures generated with this kernel and different seeds:

noise1 noise2


GPU don't have good sources of randomness, but this can be easily overcome by seeding a kernel with a random seed from the host. After that, you just need an algorithm that can work with a massive number of concurrent threads.

This link describes a Mersenne Twister implementation using OpenCL: Parallel Mersenne Twister. You can also find an implementation in the NVIDIA SDK.