I have a question about python dictionary implementation.
Looks like python will maintain a search order for all keys, e.g if you do the following operation
a = {}
a[3] = 1
a[0] = 2
a = {0:2, 3:1}
python will automatically change my insertion order. As python claims that dict is unordered set, I don't quite understand why python will maintain such a search order. Does python implement dict by a hash table and store another set for index ordering?
Hopefully I make the question clear.
Thank you
The order of a dict is completely determined by the hashing function of the object (and insertion order if there are hash collisions). Integers hash to themselves (at least up to sys.maxint
):
>>> hash(1)
1
The (C)python implementation takes the hash value of the object and takes a few bits to determine the index in the table. How many bits it takes depends on the length of the dictionary. By default, the dict has 8 available slots, so the numbers 0
and 8
will collide. We can see this as follows:
>>> d1 = {}
>>> d1[0] = 'foo'
>>> d1[8] = 'bar'
>>> d1
{0: 'foo', 8: 'bar'}
>>>
>>> d2 = {}
>>> d2[8] = 'bar'
>>> d2[0] = 'foo'
>>> d2
{8: 'bar', 0: 'foo'}
Since 0
and 8
collided in our dictionary, insertion order appears to have been maintained. 0
takes the first available slot (after all, no matter how many bits you take from 0
, you'll get 0
). 8
tries to take that slot as well. If that slot is taken, however, collision resolution takes over and python inserts that value in some later slot.
Of course, if your dictionary happens to have more than ~5 elements, it will be resized (I think to 16, but don't quote me on that) and 0
and 8
will no longer collide...
>>> d1 = {x:x for x in range(1, 6)}
>>> d1[0] = 0
>>> d1[8] = 8
>>> d1
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 8: 8}
>>> d2 = {x:x for x in range(1, 6)}
>>> d2[8] = 8
>>> d2[0] = 0
>>> d2
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 8: 8}
Note, the (sorted) order is preserved (not insertion order) which means that every integer got it's preferred spot in the hash table (no collisions). I think that the dict gets resized when it is about 2/3rds full.
Note, this is purely academic -- the python specification doesn't say this is how it works and so it could change at any time. Please don't rely on this behavior. Most of this can be gleaned from comments in the source code and documentation that sits next to it...
Dict index ordering is just a consequence of how the dict is implemented, and should not be relied on.
To be precise, Python doesn't change your insertion order (since that is just defined to be the order you insert items into the dict), but the iteration order has no guarantees.
When Python creates a dict, it creates enough space for 8 key, value pairs (I think). For an empty dict, none of them are filled. Whenever you put an item into a dict, Python takes a hash of the key and the key's hash decides on what the index will be.
If you do want the iteration order to be the same as the insertion order, check out an ordereddict.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With