Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using an object's id() as a hash value

Tags:

python

hash

is it a bad idea to implement __hash__ like so?

class XYZ:
    def __init__(self):
        self.val = None

    def __hash__(self):
        return id(self)

Am i setting up something potentially disastrous?

like image 387
Ed Danileyko Avatar asked Mar 10 '19 08:03

Ed Danileyko


People also ask

Is hash the same as ID?

Unequal objects may have the same hash values. Equal objects need to have the same id values. Whenever obj1 is obj2 is called, the id values of both objects is compared, not their hash values.

How do you hash an object?

Declaring and Instantiating a Hash ObjectYou declare a hash object using the DECLARE statement . After you declare the new hash object, use the _NEW_ operator to instantiate the object. For example: declare hash myhash; myhash = _new_ hash();

What is the hash value of an object?

The hash() method returns the hash value of an object if it has one. Hash values are just integers that are used to compare dictionary keys during a dictionary look quickly.

What is the hash value of an object Python?

What is Hash Method in Python? Hash method in Python is a module that is used to return the hash value of an object. In programming, the hash method is used to return integer values that are used to compare dictionary keys using a dictionary look up feature.


2 Answers

The __hash__ method has to satisfy the following requirement in order to work:

Forall x, y such that x == y, then hash(x) == hash(y).

In your case your class does not implement __eq__ which means that x == y if and only if id(x) == id(y), and thus your hash implementation satisfy the above property.

Note however that if you do implement __eq__ then this implementation will likely fail.

Also: there is a difference between having a "valid" __hash__ and having a good hash. For example the following is a valid __hash__ definition for any class:

def __hash__(self):
    return 1

A good hash should try to distribute uniformly the objects as to avoid collisions as much as possible. Usually this requires a more complex definition. I'd avoid trying to come up with formulas and instead rely on python built-in hash function.

For example if your class has fields a, b and c then I'd use something like this as __hash__:

def __hash__(self):
    return hash((self.a, self.b, self.c))

The definition of hash for tuples should be good enough for the average case.

Finally: you should not define __hash__ in classes that are mutable (in the fields used for equality). That's because modifying the instances will change their hash and this will break things.

like image 55
Bakuriu Avatar answered Oct 02 '22 17:10

Bakuriu


It's either pointless or wrong, depending on the rest of the class.

If your objects use the default identity-based ==, then defining this __hash__ is pointless. The default __hash__ is also identity-based, but faster, and tweaked to avoid always having the low bits set to 0. Using the default __hash__ would be simpler and more efficient.

If you objects don't use the default identity-based ==, then your __hash__ is wrong, because it's going to be inconsistent with ==. If your objects are immutable, you should implement __hash__ in a way that would be consistent with ==; if your objects are mutable, you should not implement __hash__ at all (and set __hash__ = None if you need to support Python 2).

like image 39
user2357112 supports Monica Avatar answered Oct 02 '22 19:10

user2357112 supports Monica