How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string')
wouldn't work because a different value is returned for each Python session and platform.
A hash function is a function that takes input of a variable length sequence of bytes and converts it to a fixed length sequence. It is a one way function. This means if f is the hashing function, calculating f(x) is pretty fast and simple, but trying to obtain x again will take years.
As noted by many, Python's hash is not consistent anymore (as of version 3.3), as a random PYTHONHASHSEED is now used by default (to address security concerns, as explained in this excellent answer).
A hash function doesn't have to (and can't) return a unique value for every string. You could use the first 10 characters to initialize a random number generator and then use that to pull out 100 random characters from the string, and hash that. This would be constant time.
If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.
Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest
via int()
:
>>> import hashlib >>> int(hashlib.md5('Hello, world!').hexdigest(), 16) 144653930895353261282233826065192032313L
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With