My homework involves making random integers between 0
and 2^30
. Now, in the past we've learned that rand()
only returns integers up to RAND_MAX
, that this is less than UINT_MAX
, and that we can use bit shifting to fill up that UINT_MAX
capacity. From some reading I've done (here, on SO), I realise that this may not be a very good idea if the distribution of these numbers matters to me. Having said that, my professor has specified this method.
My question is, by how much to bit shift? Will the difference between RAND_MAX
and UINT_MAX
always be such that there is a safe constant by which to bit shift? Or is there some initial probing that needs to be done to determine the number by which to bit shift? Should I just keep bit shifting by a little bit and checking against UINT_MAX
?
The reason I ask is that, UINT_MAX
is defined to be at least a certain number (65535
), but on my machine UINT_MAX
is much bigger (4294967295
). This got me worried that I might finish my homework over the weekend, arrive at school, and find everything not working well enough to submit.
Thanks!
References:
I've read a couple of similar questions but couldn't garner an answer from them.
Is the value of RAND_MAX
always (2^n)-1
?
generating a random number within range 0
to n
where n
can be > RAND_MAX
Actually, the second question above makes me question the worth of doing this at all?
Macro: int RAND_MAX. The value of this macro is an integer constant representing the largest value the rand function can return. In the GNU C Library, it is 2147483647 , which is the largest signed integer representable in 32 bits. In other libraries, it may be as low as 32767 .
The rand function, declared in stdlib. h, returns a random integer in the range 0 to RAND_MAX (inclusive) every time you call it. On machines using the GNU C library RAND_MAX is equal to INT_MAX or 231-1, but it may be as small as 32767.
Your question centers around whether RAND_MAX
and UINT_MAX
have a bit shift between them. This reduces to the question of whether UINT_MAX
and RAND_MAX
are of the form 2^k - 1
. UINT_MAX
almost assuredly would be on any binary number system-based computer. If sizeof(int)=32
bits then k=32
, if sizeof(int)=64 bit
then k=64
, etc. Now we can move to consider RAND_MAX
. In most implementations the answer is that RAND_MAX
will almost always be of the form 2^k - 1
. Why? We need to consider how most implementations of rand()
actually work.
rand()
typically uses a linear congruential generator (see http://en.wikipedia.org/wiki/Linear_congruential_generator or Knuth "The Art of Computer Programmer Part 2 : Semi-Numerical Algorithms"). Basically a random number is a sequence with a seed
x(k+1) = ( a x(k) + c ) % m
(i.e. the C library stores the last iterated x(k)
and when you call rand()
it returns x(k+1)
)
To be of good quality, the parameters of the generator (a
, c
, and m
) have to be chosen carefully. Quality typically entails how many times before the sequence repeats itself, etc. One tension in the choice of these parameters is to make m
as close to UINT_MAX
as possible so as to not avoid wasting potential random bits. If you study the generators, typically the right choice is to make m
somewhat smaller than UINT_MAX
. You also need to make m
a prime.
Typically you want rand()
to be as fast as possible so you want these operations to be cheap. The cheapest mod
to compute is one of the form foo % (2^k - 1)
because it can be implemented as foo & (1<<k-1)
. For a special choice of k
you will get a Mersenne prime.
For example, A common choice is k=31
which yields the prime 2^31-1 = 2147483647
. This is the typical choice for 32-bit integers where UINT_MAX=2^32-1 = 4294967295
. For 64-bit numbers one has UINT_MAX=2^64-1=18446744073709551615
and a choice for RAND_MAX would be 2^61-1 = 2305843009213693951
.
So in summary, to answer your question: In most implementations you can assume there is a simple bit shift, however, there is no real guarantee. At the very least you should do a run-time test at your program init. If you are using C++ a better practice would be to use a static_assert
to detect at compile time whether your assumptions are true and fail to compile if they are not. Boost has such a static assert, as does the recently approved standard C++11... i.e. one can do (though it may take some work to write a static version of is_power_of_two_minus_one
):
unsigned int myrand()
{
static_assert(sizeof(int)==4,"sizeof(unsigned int) != 4");
static_assert(is_power_of_two_minus_one(RAND_MAX),"RAND_MAX not a power of two minus one");
static_assert(is_power_of_two_minus_one(UINT_MAX),"UINT_MAX not power of two minus one");
unsigned int raw_rand=rand();
// do your bit shift to adjust raw_rand
return raw_rand;
}
Actually, the second question above makes me question the worth of doing this at all?
If your professor told you to do it this way, you should do it this way. It's not crypto-strength certainly, but for the purposes of a homework assignment it will be fine.
For RAND_MAX
being 2^n - 1
, I'd generally assume so. The way computers generate random numbers is a set number of bits at a time, so if the max weren't 2^n - 1
, either not all numbers in the range could be returned or entropy would be wasted.
As for the difference being the same on all systems, I'd strongly advise against assuming that. In your code, figure out how many bits are in each and dynamically figure out how to shift.
And can't you ssh in to the school server you'll be (eventually) running this on?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With