Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing dictionary items by position in Python 3.6+ efficiently

I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.

Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:

  1. Convert to a list via an O(n) process and then use list.__getitem__.
  2. enumerate dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.

Since getting an item from a list has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict or collections.OrderedDict would work.

If it's not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?

like image 700
jpp Avatar asked Sep 25 '18 23:09

jpp


People also ask

Do dictionaries in Python 3.7 have order?

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. Changed in version 3.7: Dictionary order is guaranteed to be insertion order.

Can you index through a dictionary in Python?

Python dictionary index numberUse the list[index] function to get index numbers from the dictionary. It will return the key and also use the items() function to return a collection from a dictionary.

How do you filter a dictionary based on values?

Filter a Dictionary by values in Python using filter() filter() function iterates above all the elements in passed dict and filter elements based on condition passed as callback.


1 Answers

For an OrderedDict it's inherently O(n) because the ordering is recorded in a linked list.

For the builtin dict, there's a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of "dummies", special internal values that mean "no key has been stored here yet" or "a key used to be stored here but no longer". That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).

But without adding auxiliary data structures on top of that, there's no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector's entries are dummies. the_vector[i] can be accessed in O(1) time, but really has no predictable relation to the i'th non-dummy entry.

like image 94
Tim Peters Avatar answered Sep 19 '22 16:09

Tim Peters