How Python stores the dict key, values when collision occurs in hash table? Whats the hash algorithm used to get the hash value here?
Python uses a method called Open Addressing for handling collisions. It also resizes the hash tables when it reaches a certain size, but we won't discuss that aspect. Open Addressing definition from Wikipedia: In another strategy, called open addressing, all entry records are stored in the bucket array itself.
Dictionary literals are constructed in the order given in the source. This means that if a key is duplicated the second key-value pair will overwrite the first as a dictionary can only have one value per key.
Dictionaries in Python First, a given key can appear in a dictionary only once. Duplicate keys are not allowed.
Properties of Dictionary Keys When duplicate keys are encountered during the assignment, the value will be the last assigned one. Keys must be immutable. This means you can use strings, numbers, or tuples as dictionary keys, but something like ['key'] is not allowed.
For the "normal" Python, this great writeup by Praveen Gollakota explains it very well, here are the important bits:
O(1)
lookup by index). <hash, key, value>
. This is implemented as a C struct (see dictobject.h:51-56).i
that is based on the hash of the key. CPython uses initial i = hash(key) & mask
, where mask = PyDictMINSIZE - 1
, but that's not really important. Just note that the initial slot, i
, that is checked depends on the hash of the key.<hash|key|value>
). But what if that slot is occupied!? Most likely because another entry has the same hash (hash collision!)i+1
, i+2
, ... and use the first available one (that's linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In random probing, the next slot is picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot is not really important (see dictobject.c:33-126 for the algorithm for probing). What is important is that the slots are probed until first empty slot is found.i
(where i
depends on the hash of the key). If the hash and the key both don't match the entry in the slot, it starts probing, until it finds a slot with a match. If all slots are exhausted, it reports a fail.The short version: The Python spec doesn't specify a dictionary implementation, but CPython uses a hash map and handles collisions with open addressing.
See this answer to a similar question and also the Wikipedia page on hash tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With