In python, if I have a tuple with many elements, is its hash calculated from its elements' <code>id</code>s or its elements' content? In this example, <pre class="prettyprint"><code>a = (1, [1,2]) hash(a) </code></pre> It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable. Now see this example <pre class="prettyprint"><code>class A: pass a0 = A() ta = (1, a0) hash(ta) # -1122968024 a0.x = 20 hash(ta) # -1122968024 </code></pre> Here it turns out the hash of <code>ta</code> does not change with the modification of its element, i.e., <code>a0</code>. So maybe <code>a0</code>'s id is used for the hash calculation? Is <code>a0</code> somehow considered as immutable? How does python know if a type is mutable? Now consider this case <pre class="prettyprint"><code>b = (1, 2) id(b) # 3980742764 c = (1, 2) id(c) # 3980732588 tb = (1, b) tc = (1, c) hash(tb) # -1383040070 hash(tc) # -1383040070 </code></pre> It seems the content of <code>b</code> and <code>c</code> are used for the hash calculation. How should I understand these examples?

<blockquote> If I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content? </blockquote> Neither. It is calculated on the basis of the hashes of these elements, not their "contents" (values/attributes), nor IDs. <hr> <h3>The basics: why hashes are used the way they are</h3> Take a look at this paragraph in python's documentation glossary. Whether something is hashable or not, and how it is hashed, depends on the implementation of its <code>__hash__()</code> method. By itself, Python has no idea about mutability of an object. A hash is useful in identification of objects. For example, it speeds up data retrieval from a <code>dict</code>, identifying the arbitrary value of a key by a single numerical value from a finite interval - the key's hash. A hash should remain unchanged throughout the lifetime of the object. Otherwise, one object could map to two different values in a <code>dict</code>, or be included into a <code>set</code> twice, as soon as its hash changes. It's not enough to compare two objects by their hashes: at the end of the day, you may still need to perform equality checks, because there may be a collision between the hashes of different objects. That's why hashable objects are required to have <code>__eq__()</code> implemented. This ties back to the mutability: if a hashable object mutates such that it changes equality comparisons with hashables, especially the ones with the same hash - it breaks the contract, and may result in the same weirdness a mutating hash would. Hashable objects should not mutate comparisons between themselves. Hashable objects that are equal to each other should have the same hash. This is a general contract that makes everything else simpler - it's natural to assume <code>x == y</code> implies that both <code>x</code> and <code>y</code> map to the same value in a <code>dict</code>. <hr> <h3>Hash of a tuple</h3> Consider your first example. The <code>tuple</code> hashes itself on the basis of its elements, while its second element, the <code>list</code>, doesn't have a hash at all - the <code>__hash__</code> method is not implemented for it. And so the <code>tuple.__hash__</code> method fails. That's why a <code>tuple</code> with a <code>list</code> object inside of it is not hashable. As you can see, it is therefore also incorrect to say, that a <code>tuple</code> hash is based on the IDs of its elements. Notice, that if the <code>list</code> was hashable here, and the hash was based on its elements, changing them would change the hash of the outer <code>tuple</code>, breaking the contract. <hr> <h3>Why my custom class doesn't require a <code>__hash__()</code>?</h3> Let's have a look at python data model documentation, and what it has to say on the topic: <blockquote> User-defined classes have <code>__eq__()</code> and <code>__hash__()</code> methods by default; with them, all objects compare unequal (except with themselves) and <code>x.__hash__()</code> returns an appropriate value such that <code>x == y</code> implies both that <code>x is y</code> and <code>hash(x) == hash(y)</code>. </blockquote> Put simply, the default implementation compares objects identity, which has nothing to do with object attributes. That's why you can change the values "inside" the object of your custom class without changing its hash. That's also why you don't have to define <code>__hash__()</code> for your classes - python does it for you in this case. In this regard you're right - the default (CPython's) implementation of the hashing function for custom classes relies on the <code>id()</code> of an object (and not on the values "inside" of it). It is an implementation detail, and it differs between Python versions. In more recent versions of Python, the relation between <code>hash()</code> and <code>id()</code> involves randomization. This prevents some forms of denial of service attacks, where creating arbitrary hash collisions could significantly slow down web applications. See PEP-456. <hr> <h3>How does it actually hash itself?</h3> While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see <code>static Py_hash_t tuplehash(PyTupleObject *v)</code>). The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one: <pre class="prettyprint"><code>y = PyObject_Hash(*p++); </code></pre> <hr> So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of its elements. Whether or not the contents and attributes of these elements are considered depends on their specific hash functions.

The core contract of hashing is that equal objects have equal hashes. In particular, hashing does not directly care about mutability or mutation; it only cares about mutation that affects equality comparisons. <hr> Your first tuple is unhashable because mutating the nested list would change how the tuple behaves in equality comparisons. Mutating <code>a0</code> in your second example doesn't affect the hash of the tuple because it doesn't affect equality comparisons. <code>a0</code> is still only equal to itself, and its hash is unchanged. <code>tb</code> and <code>tc</code> in your third example have equal hashes because they are equal tuples, regardless of whether their elements are the same objects. <hr> This all means that tuples cannot (directly) use <code>id</code> for hashes. If they did, equal tuples with distinct but equal elements could hash differently, violating the contract of hashing. Without special-casing element types, the only things tuples can use to compute their own hashes are their elements' hashes, so tuples base their hashes on their elements' hashes.

How does python compute the hash of a tuple

Tags:

python

hash

tuples

In python, if I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content?

In this example,

a = (1, [1,2]) hash(a)

It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable.

Now see this example

class A: pass a0 = A() ta = (1, a0) hash(ta)  # -1122968024 a0.x = 20 hash(ta)  # -1122968024

Here it turns out the hash of ta does not change with the modification of its element, i.e., a0. So maybe a0's id is used for the hash calculation? Is a0 somehow considered as immutable? How does python know if a type is mutable?

Now consider this case

b = (1, 2) id(b)  # 3980742764 c = (1, 2) id(c)  # 3980732588 tb = (1, b) tc = (1, c)  hash(tb)  # -1383040070 hash(tc)  # -1383040070

It seems the content of b and c are used for the hash calculation.

How should I understand these examples?

967

asked Apr 08 '18 20:04

nos

2 Answers

If I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content?

Neither. It is calculated on the basis of the hashes of these elements, not their "contents" (values/attributes), nor IDs.

The basics: why hashes are used the way they are

Take a look at this paragraph in python's documentation glossary.

Whether something is hashable or not, and how it is hashed, depends on the implementation of its __hash__() method. By itself, Python has no idea about mutability of an object.

A hash is useful in identification of objects. For example, it speeds up data retrieval from a dict, identifying the arbitrary value of a key by a single numerical value from a finite interval - the key's hash.

A hash should remain unchanged throughout the lifetime of the object. Otherwise, one object could map to two different values in a dict, or be included into a set twice, as soon as its hash changes.

It's not enough to compare two objects by their hashes: at the end of the day, you may still need to perform equality checks, because there may be a collision between the hashes of different objects. That's why hashable objects are required to have __eq__() implemented.

This ties back to the mutability: if a hashable object mutates such that it changes equality comparisons with hashables, especially the ones with the same hash - it breaks the contract, and may result in the same weirdness a mutating hash would. Hashable objects should not mutate comparisons between themselves.

Hashable objects that are equal to each other should have the same hash. This is a general contract that makes everything else simpler - it's natural to assume x == y implies that both x and y map to the same value in a dict.

Hash of a tuple

Consider your first example. The tuple hashes itself on the basis of its elements, while its second element, the list, doesn't have a hash at all - the __hash__ method is not implemented for it. And so the tuple.__hash__ method fails.

That's why a tuple with a list object inside of it is not hashable. As you can see, it is therefore also incorrect to say, that a tuple hash is based on the IDs of its elements.

Notice, that if the list was hashable here, and the hash was based on its elements, changing them would change the hash of the outer tuple, breaking the contract.

Why my custom class doesn't require a `hash()`?

Let's have a look at python data model documentation, and what it has to say on the topic:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

Put simply, the default implementation compares objects identity, which has nothing to do with object attributes. That's why you can change the values "inside" the object of your custom class without changing its hash.

That's also why you don't have to define __hash__() for your classes - python does it for you in this case.

In this regard you're right - the default (CPython's) implementation of the hashing function for custom classes relies on the id() of an object (and not on the values "inside" of it). It is an implementation detail, and it differs between Python versions.

In more recent versions of Python, the relation between hash() and id() involves randomization. This prevents some forms of denial of service attacks, where creating arbitrary hash collisions could significantly slow down web applications. See PEP-456.

How does it actually hash itself?

While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see static Py_hash_t tuplehash(PyTupleObject *v)).

The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:

y = PyObject_Hash(*p++);

So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of its elements. Whether or not the contents and attributes of these elements are considered depends on their specific hash functions.

154

answered Oct 01 '22 12:10

Błażej Michalik

The core contract of hashing is that equal objects have equal hashes. In particular, hashing does not directly care about mutability or mutation; it only cares about mutation that affects equality comparisons.

Your first tuple is unhashable because mutating the nested list would change how the tuple behaves in equality comparisons.

Mutating a0 in your second example doesn't affect the hash of the tuple because it doesn't affect equality comparisons. a0 is still only equal to itself, and its hash is unchanged.

tb and tc in your third example have equal hashes because they are equal tuples, regardless of whether their elements are the same objects.

This all means that tuples cannot (directly) use id for hashes. If they did, equal tuples with distinct but equal elements could hash differently, violating the contract of hashing. Without special-casing element types, the only things tuples can use to compute their own hashes are their elements' hashes, so tuples base their hashes on their elements' hashes.

answered Oct 01 '22 12:10

user2357112 supports Monica

Related questions
                            
                                How do I use timezones with a datetime object in python?
                            
                                Restricting values of command line options
                            
                                Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
                            
                                Purpose of python antigravity module
                            
                                Sql Alchemy QueuePool limit overflow
                            
                                iter() not working with datetime.now()
                            
                                Does Numpy automatically detect and use GPU?
                            
                                Is there a cross-platform way of getting information from Python's OSError?
                            
                                Start with pyglet or pygame? [closed]
                            
                                Where is Python language used? [closed]
                            
                                Performance effect of using print statements in Python script
                            
                                Prevent Vim from indenting line when typing a colon (:) in Python
                            
                                Getting the indices of several elements in a NumPy array at once
                            
                                Numpy - module has no attribute 'arrange' [closed]
                            
                                Python Multiple Assignment Statements In One Line
                            
                                Is there something like 'autotest' for Python unittests?
                            
                                When running a python script in IDLE, is there a way to pass in command line arguments (args)?
                            
                                How to write a custom `.assertFoo()` method in Python?
                            
                                Different std in pandas vs numpy
                            
                                Django Rest Framework - APIView Pagination

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does python compute the hash of a tuple

Tags:

python

hash

tuples

nos

People also ask

2 Answers

The basics: why hashes are used the way they are

Hash of a tuple

Why my custom class doesn't require a `hash()`?

How does it actually hash itself?

Błażej Michalik

user2357112 supports Monica

Recent Activity

Donate For Us

How does python compute the hash of a tuple

Tags:

python

hash

tuples

nos

People also ask

2 Answers

The basics: why hashes are used the way they are

Hash of a tuple

Why my custom class doesn't require a __hash__()?

How does it actually hash itself?

Błażej Michalik

user2357112 supports Monica

Related questions

Recent Activity

Donate For Us

Why my custom class doesn't require a `hash()`?