The docs incorrectly claim that
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their
id()
Although I recall this being correct once, such objects hashing equal to their id is apparently not true in current versions of python (v2.7.10, v3.5.0).
>>> class A:
... pass
...
>>> a = A()
>>> hash(a)
-9223372036578022804
>>> id(a)
4428048072
In another part of the docs it's said that the hash is derived from the id. When/why did the implementation change, and how is the number returned by hash "derived from" the id now?
The relevant function appears to be:
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
(that code comes from here, and is then used to be the tp_hash
slot in type
here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id
) directly. Indeed, the commit that introduced that change to the function is here, and states that the reason for the change is:
Issue #5186: Reduce hash collisions for objects with no hash method by rotating the object pointer by 4 bits to the right.
which refers to this issue, which explains more why the change was made.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With