Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OK to use only 64-bits of sha1 hash as an id?

Tags:

hash

sha1

1) For the purpose of really low hash collision, can I get away with just using half of the 128 bits of a sha1 rather than dealing with the sha1 itself? I understand this is not suitable for cryptographic hashes, but I just need the hashes for hash table keys.

2) Computation time isn't a priority, and besides which I'm hashing very small pieces of data. In particular, I'm mostly going to be taking 2 or 3 64-bit hashes and hashing them to get another 64-bit hash. Is there a better option than sha1 for this purpose? Again, collisions should be very unlikely.

3) I'm a sql newb. Is it a good idea to use 64-bit hashes as id's in sql? Will 64-bit id's cause performance problems in sqlite or postgres? I'm going to need to coordinate data across multiple databases (including a Lucene index), so I figured I should deal with hashes directly in the tables rather than bothering with auto-incremented ids (which would only be meaningful in one db, not across all data stores). I figure 64-bit is a good compromise: big enough for unlikely collisions but saves on space (and lookup time?).

4) What about CRC-64? Does that produce a random enough distribution?

like image 575
Jegschemesch Avatar asked Apr 16 '09 03:04

Jegschemesch


1 Answers

If you have few enough records it's almost certain that you'll never have a hash collision in 64 bits. Likely you will fall into this category.

There should be no problem with trimming down a cryptographic hash like sha1, because if there were internal structure in the hash then it wouldn't be good enough to be a crypto hash, and if there's no structure then any subset of the bits should be quite random. Note that I'm only talking about using that for IDs, not for any crypto purposes!

But really, doesn't your SQL have some kind of GUID? And if it does, why not use it?

like image 185
dwc Avatar answered Dec 17 '22 04:12

dwc