Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use rand_r and how do I use it in a thread safe way?

Tags:

I am trying to learn how to use rand_r, and after reading this question I am still a little confused, can someone please take a look and point out what I'm missing? To my understanding, rand_r takes a pointer to some value (or a piece of memory with some initial value) and use it to generate new numbers every time it is called. Each thread that calls rand_r should supply it with a unique pointer (or piece of memory) to get "actual random" numbers between different threads. That's why this:

int globalSeed;  //thread 1 rand_r(&globalSeed);  //thread 2 rand_r(&globalSeed); 

is the wrong way of using it. If I have

int seed1,seed2;  //thread 1 rand_r(&seed1);  //thread 2 rand_r(&seed2); 

this would be the right way to generate "true random" numbers between threads?


EDIT: additional questions after reading answers to the above part:

  1. if in thread 1 I need a random number between 1 to n, should I do (rand_r(&seed1) % (n-1)) + 1 ? Or there is other common way of doing this?
  2. Is it right or normal if the memory for the seed is dynamically allocated?
like image 984
derrdji Avatar asked Oct 19 '10 23:10

derrdji


People also ask

Is Rand_r thread safe?

rand_r is thread safe is because the function is entirely pure. It doesn't read or modify any state other than the arguments. It can therefore be safely called concurrently. This is different from most rand functions that hold the state (the seed) in a global variable.

What is the difference between Rand and Rand_r?

RETURN VALUEThe rand() function shall return the next pseudo-random number in the sequence. The rand_r() function shall return a pseudo-random integer.

What is Rand_r?

The rand_r() function generates a sequence of pseudo-random integers in the range 0 to RAND_MAX. (The value of the RAND_MAX macro will be at least 32767.)


1 Answers

That's correct. What you're doing in the first case is bypassing the thread-safety nature of rand_r. With many non-thread-safe functions, persistent state is stored between calls to that function (such as the random seed here).

With the thread-safe variant, you actually provide a thread-specific piece of data (seed1 and seed2) to ensure the state is not shared between threads.

Keep in mind that this doesn't make the numbers truly random, it just makes the sequences independent of each other. If you start them with the same seed, you'll probably get the same sequence in both threads.

By way of example, let's say you get a random sequence 2, 3, 5, 7, 11, 13, 17 given an initial seed of 0. With a shared seed, alternating calls to rand_r from two different threads would cause this:

thread 1                thread 2            <---  2                  3 --->            <---  5                  7 --->            <--- 11                 13 --->            <--- 17 

and that's the best case - you may actually find that the shared state gets corrupted since the updates on it may not be atomic.

With non-shared state (with a and b representing the two different sources of the random numbers):

thread 1                thread 2            <---  2a                  2b --->            <---  3a                  3b --->            <---  5a                  5b --->                  :: 

Some thread-safe calls require you to provide the thread-specific state like this, others can create thread-specific data under the covers (using a thread ID or similar information) so that you never need to worry about it, and you can use exactly the same source code in threaded and non-threaded environments. I prefer the latter myself, simply because it makes my life easier.


Additional stuff for edited question:

> If in thread 1, I need a random number between 1 to n, should I do '(rand_r(&seed1) % (n-1)) + 1', or there is other common way of doing this?

Assuming you want a value between 1 and n inclusive, use (rand_r(&seed1) % n) + 1. The first bit gives you a value from 0 to n-1 inclusive, then you add 1 to get the desired range.

> Is it right or normal if the memory for the seed is dynamically allocated?

The seed has to be persistent as long as you're using it. You could dynamically allocate it in the thread but you could also declare it in the thread's top-level function. In both those cases, you'll need to communicate the address down to the lower levels somehow (unless your thread is just that one function which is unlikely).

You could either pass it down through the function calls or set up a global array somehow where the lower levels can discover the correct seed address.

Alternatively, since you need a global array anyway, you can have a global array of seeds rather than seed addresses, which the lower levels could use to discover their seed.

You would probably (in both cases of using the global array) have a keyed structure containing the thread ID as a key and the seed to use. You would then have to write your own rand() routine which located the correct seed and called rand_r() with that.

This is why I prefer library routines which do this under the covers with thread-specific data.

like image 192
paxdiablo Avatar answered Oct 06 '22 10:10

paxdiablo