Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are the first 32 bits of an md5 hash just as "random" as any other substring?

I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?

In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?

Some preemptive clarifications:

  • These hashes don't need to be cryptographically secure.
  • I'm not concerned with the performance of md5--it is more than fast enough for my needs.
  • These hashes just need to be "random" enough that collisions are rare.
  • In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).
like image 945
Kip Avatar asked May 13 '09 20:05

Kip


1 Answers

For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.

Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).

like image 71
Joey Avatar answered Oct 21 '22 16:10

Joey