Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why mutable built-in objects cannot be hashable in Python? What is the benefit of this?

I come from Java where even mutable objects can be "hashable".
And I am playing with Python 3.x these days just for fun.

Here is the definition of hashable in Python (from the Python glossary).

hashable

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

I read it and I am thinking... Still... Why didn't they make in Python even mutable objects hashable? E.g. using the same default hashing mechanism as for user-defined objects i.e. as described by the last 2 sentences above.

Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

This feels somewhat weird... so user-defined mutable objects are hashable (via this default hashing mechanism) but built-in mutable objects are not hashable. Doesn't this just complicate things? I don't see what benefits it brings, could someone explain?

like image 635
peter.petrov Avatar asked May 06 '19 21:05

peter.petrov


People also ask

Why are mutable types not hashable?

While none of the built-in mutable objects are hashable, it is possible to make a mutable object with a hash value that's not mutable. It's common for only a portion of the object to represent its identity, while the rest of the object contains properties that are free to change.

Which of the object can not be hashable in Python?

All immutable built-in objects in Python are hashable like tuples while the mutable containers like lists and dictionaries are not hashable. Objects which are instances of the user-defined class are hashable by default, they all compare unequal, and their hash value is their id().

What does it mean to be hashable in Python?

A hashable Python object is any object that has a hash value — an integer identificator of that object which never changes during its lifetime. To check if an object is hashable or not and find out its hash value (if it's hashable), we use the hash() function on this object: print(hash(3.14))Output: 322818021289917443.

Why list in Python is not hashable?

This error occurs when trying to hash a list, which is an unhashable object. For example, using a list as a key in a Python dictionary will cause this error since dictionaries only accept hashable data types as a key. The standard way to solve this issue is to cast a list to a tuple, which is a hashable data type.


2 Answers

In Python, mutable objects can be hashable, but it is generally not a good idea, because generally speaking, the equality is defined in terms of these mutable attributes, and this can lead to all sorts of crazy behavhior.

If built-in mutable objects are hashed based on identity, like the default hashing mechanism for user-defined objects, then their hash would be inconsistent with their equality. And that is absolutely a problem. However, user-defined objects by default compare and hash based on identity, so it isn't as bad of a situation, although, this set of affairs isn't very useful.

Note, if you implement __eq__ in a user-defined class, the __hash__ is set to None, making the class unhashable.

So, from the Python 3 data model documentation:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the __hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.abc.Hashable).

like image 155
juanpa.arrivillaga Avatar answered Oct 29 '22 02:10

juanpa.arrivillaga


Calculating a hash value is like giving an identity to an object which simplify the comparison of objects. The comparison by hash value is generally faster than the comparison by value: for an object, you compare its attributes, for a collection, you compare its items, recursively…

If an object is mutable you need to calculate its hash value again after each changes. If this object was compared equal with another one, after a change it becomes unequal. So, mutable objects must be compared by value, not by hash. It’s a non-send to compare by hash values for mutable objects.

Edit: Java HashCode

Typically, hashCode() just returns the object's address in memory if you don't override it.

See the reference about the hashCode function.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

So, the Java hashCode function works the same as the default Python __hash__ function.

In Java, if you use a mutable object in a HashSet, for instance, the HashSet isn’t working properly. Because the hashCode depends of the state of the object it can no longer be retrieved properly, so the check for containment fails.

like image 1
Laurent LAPORTE Avatar answered Oct 29 '22 03:10

Laurent LAPORTE