Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the logic behind Python's hash function order?

As we know, Some of Python's data structures use hash tables for storing items like set or dictionary. So there is no order in these objects. But it seems that, for some sequences of numbers that's not true.

For example consider the following examples :

>>> set([7,2,5,3,6])
set([2, 3, 5, 6, 7])

>>> set([4,5,3,0,1,2])
set([0, 1, 2, 3, 4, 5])

But it isn't sorted if we make a small change :

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

So the question is: How does Python's hash function work on integer sequences?

like image 227
Mazdak Avatar asked Jan 09 '23 06:01

Mazdak


1 Answers

Although there are a lot of questions in SO about hash and its order,but no one of them explains the algorithm of hash function.

So all you need here is know that how python calculate the indices in hash table.

If you go through the hashtable.c file in CPython source code you'll see the following lines in _Py_hashtable_set function which shows the way python calculate the index of hash table keys :

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

So as the hash value of integers is the integer itself * (except for -1) the index is based on the number and the length of your data structure (ht->num_buckets - 1) and it calculated with Bitwise-and between (ht->num_buckets - 1) and the number.

Now consider the following example with set that use hash-table :

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

For number 33 we have :

33 & (ht->num_buckets - 1) = 1

That actually it's :

'0b100001' & '0b111'= '0b1' # 1 the index of 33

Note in this case (ht->num_buckets - 1) is 8-1=7 or 0b111.

And for 1919 :

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

And for 333 :

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

And as well as for the preceding examples in question :

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

'0b1000' & '0b100'='0b0' # for 8
'0b110' & '0b100'='0b100' # for 8

* The hash function for class int :

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value

like image 124
Mazdak Avatar answered Jan 15 '23 13:01

Mazdak