Using SHA1 to hash down larger size strings so that they can be used as a keys in a database.
Trying to produce a UUID-size string from the original string that is random enough and big enough to protect against collisions, but much smaller than the original string.
Not using this for anything security related.
# Take a very long string, hash it down to a smaller string behind the scenes and use
# the hashed key as the data base primary key instead
def _get_database_key(very_long_key):
return hashlib.sha1(very_long_key).digest()
Is SHA1 a good algorithm to be using for this purpose? Or is there something else that is more appropriate?
Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.
Chaining is a technique used for avoiding collisions in hash tables. A collision occurs when two keys are hashed to the same index in a hash table.
It's not possible to avoid collisions with a hash. If you have no collisions then you don't have a hashing function. The goal is to minimize collisions, not eliminate them. You'll always have contention unless you have more possible hashes than possible inputs, which sort of defeats the point of hashing.
Doubling the size of the table will halve the expected number of collisions. The latter strategy gives rise to an important property of hash tables that we have not seen in any other data structure.
Python has a uuid
library, based on RFC 4122.
The version that uses SHA1 is UUIDv5, so the code would be something like this:
import uuid
uuid.uuid5(uuid.NAMESPACE_OID, 'your string here')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With