How does a hash value of some particular string is calculated in CPython2.7?
For instance, this code:
print hash('abcde' * 1000)
returns the same value even after I restart the Python process and try again (I did it many times).
So, it seems that id()
(memory address) of the string doesn't used in this computation, right? Then how?
A hash function is a function that takes input of a variable length sequence of bytes and converts it to a fixed length sequence. It is a one way function. This means if f is the hashing function, calculating f(x) is pretty fast and simple, but trying to obtain x again will take years.
To calculate hash of some data, you should first construct a hash object by calling the appropriate constructor function ( blake2b() or blake2s() ), then update it with the data by calling update() on the object, and, finally, get the digest out of the object by calling digest() (or hexdigest() for hex-encoded string).
It's perfectly fine to have hashes as strings. For example, one of the most popular hash functions, SHA 256 returns a 64 character string that has both letters and numbers. Well, SHA256 returns 256 bits.
Hash values are not dependent on the memory location but the contents of the object itself. From the documentation:
Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).
See CPython's implementation of str.__hash__
in:
Objects/unicodeobject.c
(for unicode_hash
)Python/pyhash.c
(for _Py_HashBytes
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With