Apparently python will allow me to hash a generator expression like (i for i in [1, 2, 3, 4, 5])
>>> hash(i for i in [1, 2, 3, 4, 5])
8735741846615
On closer inspection however, this hash value is always the same no matter what generator I put into it!
>>> hash(i for i in range(2))
8735741846615
>>> hash(i for i in [1, 2, 3, 4, 5, 6])
8735741846615
>>> hash(i for i in [0, 1, 2, 3, 4, 5, 6])
8735741846615
why is this happening? Why is hashing a generator even allowed?
I need to do this because I’m storing stuff in a dictionary cache. You access entries in the cache by passing lists of objects. But certain groups of lists with different items should still point to the same entry, since the only thing that matters here is the integer value of one attribute in each list item. So they should hash only based on those integer values, not anything relating to the items themselves, to avoid unnecessary cache misses.
I'm aware you can always just convert the expression to a tuple, but I was asking if you can bypass that by simply using the output of the generator without the tuple container, similar to how sum()
could be used for such a thing.
So there are two questions here actually:
For 2, the answer is simple: In this case, hash is based on the object's id
. Since you don't actually store the object, its memory gets reused. That means the next generator has the same id
and thus hash.
For 1, the answer is "because they can". hash
is primarily meant for use in dict
, set
and other situations where it allows identifying an object. These situations set the constraint that a == b
also implies hash(a) == hash(b)
(the reverse is not constrained).
Now, for list
, dict
and other collections, equality is based on content. [1,2,3] == [1,2,3]
regardless whether both are the same objects, for example. This means if something is added to them, their equality changes and thus their hash
would change as well. Thus, hash
is undefined, as it must be a constant for it to work in dict
etc.
In contrast, a generator can have any content. Consider for example a generator providing random values. Thus, it makes no sense to compare generators by content. They are only ever compared by identity. So, a == b
equals id(a) == id(b)
for generators. In turn, this means basing hash(a)
on id(a)
will always satisfy the constraint by equality on hash
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With