>>> hash("\x01") 128000384 >>> hash("\x02") 256000771 >>> hash("\x03") 384001154 >>> hash("\x04") 512001541
Interesting part is 128000384 x 2
is not 256000771
, and also others
I am just wondering how that algorithm works and want to learn something on it.
So, there you have it: Python uses SipHash because it's a trusted, cryptographic hash function that should prevent collision attacks.
With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.
In hashing, large keys are converted into small keys by using hash functions. The values are then stored in a data structure called hash table. The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each element is assigned a key (converted key).
If you download the source code of Python, you will find it for sure! But bear in mind the hash function is implemented for each kind of objects differently.
For example, you will find the unicode hash function in Objects/unicodeobject.c
in the function unicode_hash
. You might have to look a bit more to find the string hash function. Find the structure defining the object you are interested in, and in the field tp_hash
, you will find the function that compute the hash code of that object.
For the string object: The exact code is found in Objects/stringobject.c
in the function string_hash
:
static long string_hash(PyStringObject *a) { register Py_ssize_t len; register unsigned char *p; register long x; if (a->ob_shash != -1) return a->ob_shash; len = Py_SIZE(a); p = (unsigned char *) a->ob_sval; x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(a); if (x == -1) x = -2; a->ob_shash = x; return x; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With