Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving "randomness" when extending the range of rand()

I am playing around with a pair of algorithms I found on other SO posts (listed in the references below), and am trying to find out how to improve the distribution. I am effectively extending the range of a random number by doubling the number of bits, and want to ensure that the distribution is as uniform as possible, while removing (or at least reducing) the effects of modulo bias and other artifacts for a shuffling algorithm that would be using the result of my modified random number generator.

So, it is to my understanding that if I initialize my RNG with a constant seed (ie: srand(1)) I will get the same pattern of deterministic outputs from calling rand() in a for loop. Now, if I were to initialize my seed via srand(time(NULL)), it would be a different pattern, but it still might not help me with the following problem: I am trying to figure out if I were to implement the following algorithm:

  • Take two random numbers a,b
  • Calculate a*(RAND_MAX+1)+b

Would I be able to:

  1. Generate every possible coordinate pair (a,b), where a,b ∈ Z+ on [0, RAND_MAX] (a and b are positive integers between zero and RAND_MAX inclusive).
  2. Maximize the uniformity of the entire distribution (ie: an optimally flat histogram).

While the output of rand() is supposed to be uniformly distributed, I don't know if it's guaranteed to give me values for N,N+1 calls to rand each loop and give me every pair listing in point (1) before the random sequence repeats itself again. My new random number generator could theoretically generate random values on [0, RAND_MAX ^ 2], but I don't know if there might be "holes" aka values in this range that can never be generated by my algorithm.

I attempted to get further on this question myself, but I couldn't find information on how long a random sequence generated be rand() in C goes until it repeats itself. Lacking this and other information, I couldn't figure out whether or not it is possible to generate every pair (a,b).

So, using rand(), is it possible to achieve point (1) and if so, are there any solid suggestions on how to optimize its "randomness" according to point (2)?

Thank you for your time and assistance.

Update

I later revisited this problem and simulated it using an 8-bit PRNG. While it could indeed generate every possible coordinate pair, the distribution was actually quite interesting, and definitely not uniform. In the end, I read several articles/papers on PRNGs and used a Mersenne Twiser algorithm to generate the additional bits needed (i.e. MT19937-64).

References


  1. Extend rand() max range, Accessed 2014-05-07, <https://stackoverflow.com/questions/9775313/extend-rand-max-range>
  2. Shuffle array in C, Accessed 2014-05-07, <https://stackoverflow.com/questions/6127503/shuffle-array-in-c>
like image 760
Cloud Avatar asked May 07 '14 16:05

Cloud


People also ask

Is rand () truly random?

However, the numbers generated by the rand() are not random because it generates the same sequence each time the code executed.

What is the range of rand ()?

rand() returns a pseudo-random number in the range of [0, RAND_MAX). RAND_MAX: is a constant whose default value may vary between implementations but it is granted to be at least 32767.

What is the difference between rand () and srand () in C++?

The rand() function in C++ is used to generate random numbers; it will generate the same number every time we run the program. In order to seed the rand() function, srand(unsigned int seed) is used. The srand() function sets the initial point for generating the pseudo-random numbers.

Why does rand () return the same number?

The RAND function in stand-alone applications generates the same numbers each time you run your application because the uniform random number generator that RAND uses is initialized to same state when the application is loaded.


1 Answers

Assumptions

As pointed out in the comments, the behaviour of rand() is implementation dependent. So, let's make a few simplifying assumptions to get to the point of the question:

  • rand() can generate all values from 0 to RAND_MAX. Justfication: If it could not, then it would be even harder to generate all possible pairs (a, b).
  • rand() generates a statistically random sequence. Justification: The result of composing two random functions (or the same one twice) is only as good as the base random function.

Of course, we shouldn't expect the result to be better than the building blocks, so any deficiencies in the rand() implementation will reflect itself in any functions composed from it.

Holes in the distribution

Seeding rand() generates a deterministic sequence for a given seed, as the seed determines the PRNG's initial state. The sequence's maximum period is 2N where N is the number of bits in the state. Note that the state may in fact have more bits than RAND_MAX, we'll assume RAND_MAX = (2N - 1) for this section. Because it is a sequence, generating two successive "random" N-bit values a and b means that ab. Therefore, the method a*(RAND_MAX+ 1)+b will have some holes.

A little explanation on ab: PRNGs work by maintaining an internal state of N bits. It uses that state uniquely to determine its next state, so once the same state recurs, the sequence starts to repeat itself. The number of states gone through before the sequence starts repeating itself is called the period. So, technically we could have a = b, but that implies a period of 1, and that's a very bad PRNG. For more information, a helpful answer on PRNG periods has been posted on the Software Engineering site.

An algorithm without holes

One way to allow successive "random" calls to be equal is to generate 2 N-bit numbers, but consider only a certain number of bits significant, i.e. discard some. Now, we can have a = b, though with very slightly less probability than another random number c. Note this is similar to how Java's random number generator works. It is seeded with 48 bits, but outputs a 32-bit random number, discarding 16 bits (assuming # of bits in seed = # of bits in state).

However, since you need values larger than RAND_MAX, what you could do is use the above method, and then concatenate the bits, until you get enough bits to reach the desired maximum (though again, the distribution is not quite uniform).

like image 55
DPenner1 Avatar answered Sep 19 '22 18:09

DPenner1