Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can lists not be used as dictionary keys? [duplicate]

I would like to have a list that is a key in a dictionary, defined like that:

data = { 
  [24,48,96]: ["QN.FN.EQ", "OT.AR.LN", "BL.VL.TR"] 
}

This does not work... the error says because "list type is not hashable...".

Is there any workaround? In order to be able to get data from that dictionary like this:

data[[24,48,96]] # => ["QN.FN.EQ", "OT.AR.LN", "BL.VL.TR"]

The only solution that I have now is - to convert the list to a string and use strings as keys.

data = { 
  "24,48,96": ["QN.FN.EQ", "OT.AR.LN", "BL.VL.TR"] 
}
arr = [24,48,96]
print(data[','.join(map(str,arr))])
like image 812
BladeMight Avatar asked Dec 23 '22 03:12

BladeMight


2 Answers

I'm answering the question in the title of this post. :)

Because lists are mutable, dict keys need to be hashable, and hashing mutable objects is a bad idea because hash values should be computed on the basis of instance attributes.

Example 1: hashing a mutable object where the hash value is based on a mutable characteristic of the object.

>>> class stupidlist(list):
...     def __hash__(self):
...         return len(self)
... 
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True

After mutating stupid, it cannot be found in the dict any longer because the hash changed. Only a linear scan over the list of the dict's keys finds stupid.

Example 2: ... but why not just a constant hash value?

>>> class stupidlist2(list):
...     def __hash__(self):
...         return id(self)
... 
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>> 
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False

That's not a good idea as well because equal objects should hash identically such that you can find them in a dict or set.

Example 3: ... ok, what about constant hashes across all instances?!

>>> class stupidlist3(list):
...     def __hash__(self):
...         return 1
... 
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>> 
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True

Things seem to work as expected, but think about what's happening: when all instances of your class produce the same hash value, you will have a hash collision whenever there are more than two instances as keys in a dict or present in a set.

Finding the right instance with d[key] or key in d needs to perform as many equality checks as there are instances of stupidlist3 in the dict's keys. At this point, the purpose of the dictionary - O(1) lookup - is completely defeated. This is demonstrated in the following timings (done with IPython).

Some Timings

>>> lists_list = [[i]  for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>> 
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

As you can see, the membership test in our stupidlists_set is even slower than a linear scan over the whole lists_list, while you have the expected super fast lookup time (factor 500) in a set without loads of hash collisions.


TL; DR: you can use tuple(yourlist) as dict keys, because tuples are immutable and hashable.

like image 167
timgeb Avatar answered Feb 12 '23 03:02

timgeb


You can use a tuple as a dictionary key instead:

data = { 
    (24,48,96): ["QN.FN.EQ", "OT.AR.LN", "BL.VL.TR"] 
}

print data[(24,48,96)]
like image 27
Felipe Ferri Avatar answered Feb 12 '23 04:02

Felipe Ferri