Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens if an object's __hash__ changes?

Tags:

In Python, I know that the value __hash__ returns for a given object is supposed to be the same for the lifetime of that object. But, out of curiosity, what happens if it isn't? What sort of havoc would this cause?

class BadIdea(object):   def __hash__(self):     return random.randint(0, 10000) 

I know __contains__ and __getitem__ would behave strangely, and dicts and sets would act odd because of that. You also might end up with "orphaned" values in the dict/set.

What else could happen? Could it crash the interpreter, or corrupt internal structures?

like image 972
Richard Levasseur Avatar asked Apr 18 '14 19:04

Richard Levasseur


People also ask

What does __ hash __ do?

Introduction to the Python hash function The hash() function accepts an object and returns the hash value as an integer. When you pass an object to the hash() function, Python will execute the __hash__ special method of the object.

Can mutable objects be hashed?

They all compare unequal (except with themselves), and their hash value is derived from their id(). This feels somewhat weird... so user-defined mutable objects are hashable (via this default hashing mechanism) but built-in mutable objects are not hashable.

Why are immutable objects hashable?

We could say that all immutable objects are hashable, because if the hash changes during the object's lifetime, then it means that the object mutated. But not quite. Consider a tuple which has a list (mutable). Some say tuple is immutable, but at the same time it is somewhat not hashable (throws).

Are objects hashable?

00:23 All immutable objects are hashable, but not all hashable objects are immutable. That's just a rule, so anytime you see some example immutable object, you know for a fact that it's hashable, but there are some cases where there are hashable objects that you actually can mutate.

Why is the hash of an object mutable?

This is important basically because we want the hash of an object to remain the same across the object's lifetime. But if we have a mutable object, then that object itself can change over its lifetime. But then according to our first bullet point above, that object's hash has to change too.

Why can't I hash an object in Python?

This is because Python has an additional restriction on hashing: In order for an object to be hashable, it must be immutable. This is important basically because we want the hash of an object to remain the same across the object's lifetime. But if we have a mutable object, then that object itself can change over its lifetime.

Is the hash of an object ever re-computed?

The hash of an object is never re-computed once it is inserted. Objects that implement logical equality (e.g. implement __eq__) must be immutable to be hashable.

Why do we need a hash function?

Since there are generally more possible objects than hash values, two objects may hash to the same value. This is called a hash collision, and anything that deals with hashes should be able to deal with them. This won't be discussed here, but an additional property that a good hash function should satisfy to be useful is this:


2 Answers

Your main problem would indeed be with dicts and sets. If you insert an object into a dict/set, and that object's hash changes, then when you try to retrieve that object you will end up looking in a different spot in the dict/set's underlying array and hence won't find the object. This is precisely why dict keys should always be immutable.

Here's a small example: let's say we put o into a dict, and o's initial hash is 3. We would do something like this (a slight simplification but gets the point across):

 Hash table:    0   1   2   3   4   5   6   7 +---+---+---+---+---+---+---+---+ |   |   |   | o |   |   |   |   | +---+---+---+---+---+---+---+---+               ^               we put o here, since it hashed to 3 

Now let's say the hash of o changes to 6. If we want to retrieve o from the dict, we'll look at spot 6, but there's nothing there! This will cause a false negative when querying the data structure. In reality, each element of the array above could have a "value" associated with it in the case of a dict, and there could be multiple elements in a single spot (e.g. a hash collision). Also, we'd generally take the hash value modulo the size of the array when deciding where to put the element. Irrespective of all these details, though, the example above still accurately conveys what could go wrong when the hash code of an object changes.

Could it crash the interpreter, or corrupt internal structures?

No, this won't happen. When we say an object's hash changing is "dangerous", we mean dangerous in the sense that it essentially defeats the purpose of hashing and makes the code difficult if not impossible to reason about. We don't mean dangerous in the sense that it could cause crashes.

like image 80
arshajii Avatar answered Sep 25 '22 22:09

arshajii


There's a great post on Github about it: What happens when you mess with hashing. First, you need to know that Python expects (quoted from the article):

  • The hash of an object does not change across the object's lifetime (in other words, a hashable object should be immutable).

  • a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

Here's the code example which show the problem of a variant hash, with a slightly different class example, but the idea remains the same:

>>> class Bad(object):  ...     def __init__(self, arg):  ...         self.arg = arg  ...     def __hash__(self):  ...         return hash(self.arg)  ...  >>> Bad(1)  <__main__.Bad object at ...>  >>> hash(Bad(1))  1  >>> a = Bad(1)  >>> b = {a:1}  >>> a.arg = 2  >>> hash(a)  2  >>> b[a]  Traceback (most recent call last): ... KeyError: <__main__.Bad object at ...> 

Here, we implicitly changed the hash of a by mutating the argument of a that is used to compute the hash. As a result, the object is no longer found in a dictionary, which uses the hash to find the object.

Note that Python doesn't prevent me from doing this. I could make it if I want, by making __setattr__ raise AttributeError, but even then I could forcibly change it by modifying the object's __dict__. This is what is meant when we say that Python is a "consenting adults" language.

It won't make Python crash but unexpected behavior will happen with dict/set and everything based on object hash.

like image 42
Maxime Lorant Avatar answered Sep 24 '22 22:09

Maxime Lorant