I am quite often using funky stuff as keys for dictionaries, and therefore, I am wondering what is the right way to do it - and this goes through implementing good hash methods for my objects. I am aware of other questions asked here like good way to implement hash, but I'd like to understand how the default <code>__hash__</code> works for custom objects, and if it is possible to rely on it. I have noticed that mutables are explicitely unhashable since <code>hash({})</code> raises an error ... but strangely, custom classes are hashable : <pre class="prettyprint"><code>>>> class Object(object): pass >>> o = Object() >>> hash(o) </code></pre> So, does anybody knows how this default hash function works ? By understanding this, I'd like to know : Can I rely on this default hash, if I put objects of a same type as keys of a dictionary ? e.g. : <pre class="prettyprint"><code>key1 = MyObject() key2 = MyObject() key3 = MyObject() {key1: 1, key2: 'blabla', key3: 456} </code></pre> Can I rely on it if I use objects of different types as keys in a dictionary ? e.g. <pre class="prettyprint"><code>{int: 123, MyObject(10): 'bla', 'plo': 890} </code></pre> And in the last case also, how to make sure that my custom hashes don't clash with the builtin hashes ? e.g : <pre class="prettyprint"><code>{int: 123, MyObject(10): 'bla', MyObjectWithCustomHash(123): 890} </code></pre>

In Python 3 the following function is used on subclasses of <code>object</code> against the <code>id()</code> of the object (from <code>pyhash.c</code>) <pre class="prettyprint"><code>Py_hash_t _Py_HashPointer(void *p) { Py_hash_t x; size_t y = (size_t)p; /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash collisions for dicts and sets */ y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4)); x = (Py_hash_t)y; if (x == -1) x = -2; return x; } </code></pre> <code>SIZEOF_VOID_P</code> is 8 for 64-bit Python and 4 for 32-bit Python. <pre class="prettyprint"><code>>>> class test: pass ... >>> a = test() >>> id(a) 4325845928 >>> hash(a) -9223372036584410438 </code></pre> You can see that the hash is calculated from <code>id(a)</code> using the formula <code>(id(a) >> 4) | (id(a) << (8 * SIZEOF_VOID_P - 4))</code>, where the bitwise operations are performed on <code>C</code> signed integers. For example, for the <code>a</code> defined above: <pre class="prettyprint"><code>>>> import numpy >>> y = numpy.array([4325845928], dtype='int64') >>> SIZEOF_VOID_P = 8 >>> (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4)) array([-9223372036584410438]) </code></pre> Note that I am using <code>numpy.array(dtype='int64')</code> so that the bitwise operations act the same way they would in C (if you perform the same operation on Python ints you get different behavior because they don't overflow). See https://stackoverflow.com/a/5994397/161801.

What is the default hash in python?

Tags:

python

I am quite often using funky stuff as keys for dictionaries, and therefore, I am wondering what is the right way to do it - and this goes through implementing good hash methods for my objects. I am aware of other questions asked here like good way to implement hash, but I'd like to understand how the default __hash__ works for custom objects, and if it is possible to rely on it.

I have noticed that mutables are explicitely unhashable since hash({}) raises an error ... but strangely, custom classes are hashable :

>>> class Object(object): pass >>> o = Object() >>> hash(o)

So, does anybody knows how this default hash function works ? By understanding this, I'd like to know :

Can I rely on this default hash, if I put objects of a same type as keys of a dictionary ? e.g. :

key1 = MyObject() key2 = MyObject() key3 = MyObject() {key1: 1, key2: 'blabla', key3: 456}

Can I rely on it if I use objects of different types as keys in a dictionary ? e.g.

{int: 123, MyObject(10): 'bla', 'plo': 890}

And in the last case also, how to make sure that my custom hashes don't clash with the builtin hashes ? e.g :

{int: 123, MyObject(10): 'bla', MyObjectWithCustomHash(123): 890}

642

asked Jul 04 '12 07:07

sebpiq

2 Answers

What you can rely on: custom objects have a default hash() that is based in some way on the identity of the object. i.e. any object using the default hash will have a constant value for that hash over its lifetime and different objects may or may not have a different hash value.

You cannot rely on any particular relationship between the value returned by id() and the value returned by hash(). In the standard C implementation of Python 2.6 and earlier they were the same, in Python 2.7-3.2 hash(x)==id(x)/16.

Edit: originally I wrote that in releases 3.2.3 and later or 2.7.3 or later the hash value may be randomised and in Python 3.3 the relationship will always be randomised. In fact that randomisation at present only applies to hashing strings so in fact the divide by 16 relationship may continue to hold for now, but don't bank on it.

Hash collisions don't usually matter: in a dictionary lookup to find an object it must have the same hash and must also compare equal. Collisions only matter if you get a very high proportion of collisions such as in the denial of service attack that led to recent versions of Python being able to randomise the hash calculation.

106

answered Oct 13 '22 21:10

Duncan

In Python 3 the following function is used on subclasses of object against the id() of the object (from pyhash.c)

Py_hash_t _Py_HashPointer(void *p) {     Py_hash_t x;     size_t y = (size_t)p;     /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid        excessive hash collisions for dicts and sets */     y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));     x = (Py_hash_t)y;     if (x == -1)         x = -2;     return x; }

SIZEOF_VOID_P is 8 for 64-bit Python and 4 for 32-bit Python.

>>> class test: pass ... >>> a = test() >>> id(a) 4325845928 >>> hash(a) -9223372036584410438

You can see that the hash is calculated from id(a) using the formula (id(a) >> 4) | (id(a) << (8 * SIZEOF_VOID_P - 4)), where the bitwise operations are performed on C signed integers. For example, for the a defined above:

>>> import numpy >>> y = numpy.array([4325845928], dtype='int64') >>> SIZEOF_VOID_P = 8 >>> (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4)) array([-9223372036584410438])

Note that I am using numpy.array(dtype='int64') so that the bitwise operations act the same way they would in C (if you perform the same operation on Python ints you get different behavior because they don't overflow). See https://stackoverflow.com/a/5994397/161801.

answered Oct 13 '22 20:10

asmeurer

Related questions
                            
                                run program in Python shell
                            
                                Pyaudio installation error - 'command 'gcc' failed with exit status 1'
                            
                                Adding line breaks in ipython
                            
                                How do I get the time a file was last modified in Python? [duplicate]
                            
                                How to send image generated by PIL to browser?
                            
                                using pip3: module "importlib._bootstrap" has no attribute "SourceFileLoader"
                            
                                Getting the remaining arguments in argparse
                            
                                PIL rotate image colors (BGR -> RGB)
                            
                                Python 2.x gotchas and landmines [closed]
                            
                                Python sign SOAP request using BinarySecurityToken
                            
                                Python Multiprocessing.Pool lazy iteration
                            
                                What is the advantage of setting zip_safe to True when packaging a Python project?
                            
                                Pip build option to use multicore
                            
                                Python imports for tests using nose - what is best practice for imports of modules above current package
                            
                                Too many different Python versions on my system and causing problems
                            
                                How exactly is Python Bytecode Run in CPython?
                            
                                Difference between class foo and class foo(object) in Python
                            
                                Flask - POST Error 405 Method Not Allowed
                            
                                JSON serialize a dictionary with tuples as key
                            
                                When should I use a Map instead of a For Loop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the default hash in python?

Tags:

python

sebpiq

People also ask

2 Answers

Duncan

asmeurer

Recent Activity

Donate For Us

What is the default __hash__ in python?

Tags:

python

sebpiq

People also ask

2 Answers

Duncan

asmeurer

Related questions

Recent Activity

Donate For Us

What is the default hash in python?