Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is repeatedly seeding a random number generator a reasonable hash function?

I wish to generate a large amount of random data, which is reproducible for a given key, comprising a list of numbers:

[a, b, c, d, e, ...]

Is the following a good or sensible way to get a RNG into a state to generate random data, in such a way that for each n-tuple [a, b, c, ..., n], that data is uncorrelated with the output for the "adjacent" n-tuples [a+1, b, c, ..., n], [a, b+1, c, ..., n], etc.

srand(a);
srand(rand() * b);
srand(rand() * c);
...
srand(rand() * n);

# generate random data:
for (int i=0; i < 100; +i)
  printf("%d", rand());

I think this question boils down to the following: is rand_hash a good hash function for the 2-tuple (a, b)?

int rand_hash(int a, int b) { 
  srand(a); 
  srand(rand() * b); 
  return rand();
}

NB: I don't wish to imply that srand and rand are any particular implementation of an RNG. Assume for the sake of argument that we're using a good Mersenne Twister code.

Edit: If it isn't clear, by "reasonable hash function" I mean the following. In the restricted case of a 2-tuple [a, b], then the output of rand_hash should be uniform over the range of int, and (typically) there should be no correlation between the magnitude in the change of a or b and the magnitude of the change in the return value.

like image 691
Nick Avatar asked Oct 28 '11 18:10

Nick


1 Answers

No, this is not a reasonable approach.

  1. You don't know what the implementation of rand is. Random number generators are designed to provide approximately uniformly distributed numbers over a period of several generated mnumbers. They are not designed to provide uniformly distributed numbers over the set of (32 bit) seeds. In your hypothetical mersenne_twister case, the random number generator has state much larger than the integer you supply to srand (specifically, 624*sizeof(int)). Most of the power the RNG has to ensure its output is random and uniform are from that additional state, and you took that away. (The seed can be only one of 2^32 states)
  2. If you ever upgrade your compiler or libraries or something similar, anything you might have serialized to disk will become unreadable. (If rand is a black box, nobody says that tomorrow's implementation matches today's).
  3. Your hashing function's output returns the same thing for the same inputs to srand. Therefore, you already have a hash -- the input to srand. The RNG generates the same output for a given input to srand. Therefore the number of hashes you may obtain is no greater than just returning the hash you would have already calculated. If your initial hash into srand is of poor distribution for a hash table, then scale the hash appropriately such that it performs well in your table.
  4. For some implementations of rand, this performs extremely poorly. Consider a linear congruential generator (which is more common with C libraries because it has state of sizeof(int) -- e.g. the BSD generator ). A LCG follows the form xNext = a*xCurrent + b. Consider:

    static int seed = 0;
    
    void srand(int newSeed)
    {
        seed = newSeed;
    }
    
    int rand()
    {
        seed = (int) ((1103515245 * ((unsigned int)seed) + 12345) & 0x7fffffffUL); 
        return seed;
    }
    

    Note that this (common) type of generator produces hash values easily correlated to your input values.

like image 99
Billy ONeal Avatar answered Oct 12 '22 01:10

Billy ONeal