Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Key-ordered dict in Python

I am looking for a solid implementation of an ordered associative array, that is, an ordered dictionary. I want the ordering in terms of keys, not of insertion order.

More precisely, I am looking for a space-efficent implementation of a int-to-float (or string-to-float for another use case) mapping structure for which:

  • Ordered iteration is O(n)
  • Random access is O(1)

The best I came up with was gluing a dict and a list of keys, keeping the last one ordered with bisect and insert.

Any better ideas?

like image 729
LeMiz Avatar asked Aug 23 '09 22:08

LeMiz


People also ask

What is an ordered dictionary in Python?

Python's OrderedDict is a dict subclass that preserves the order in which key-value pairs, commonly known as items, are inserted into the dictionary. When you iterate over an OrderedDict object, items are traversed in the original order. If you update the value of an existing key, then the order remains unchanged.

Is Python dict values ordered?

A dictionary in Python is a collection of items that stores data as key-value pairs. In Python 3.7 and later versions, dictionaries are sorted by the order of item insertion. In earlier versions, they were unordered.

How do you create an ordered dictionary in Python?

We can create ordered dictionary using OrderedDict function in collections. Ordered dictionary preserves the insertion order. We can iterate through the dictionary items and see that the order is preserved.

How do you sort a dictionary key in Python?

First, sort the keys alphabetically using key_value. iterkeys() function. Second, sort the keys alphabetically using the sorted (key_value) function & print the value corresponding to it. Third, sort the values alphabetically using key_value.


1 Answers

"Random access O(1)" is an extremely exacting requirement which basically imposes an underlying hash table -- and I hope you do mean random READS only, because I think it can be mathematically proven than it's impossible in the general case to have O(1) writes as well as O(N) ordered iteration.

I don't think you will find a pre-packaged container suited to your needs because they are so extreme -- O(log N) access would of course make all the difference in the world. To get the big-O behavior you want for reads and iterations you'll need to glue two data structures, essentially a dict and a heap (or sorted list or tree), and keep them in sync. Although you don't specify, I think you'll only get amortized behavior of the kind you want - unless you're truly willing to pay any performance hits for inserts and deletes, which is the literal implication of the specs you express but does seem a pretty unlikely real-life requirement.

For O(1) read and amortized O(N) ordered iteration, just keep a list of all keys on the side of a dict. E.g.:

class Crazy(object):   def __init__(self):     self.d = {}     self.L = []     self.sorted = True   def __getitem__(self, k):     return self.d[k]   def __setitem__(self, k, v):     if k not in self.d:       self.L.append(k)       self.sorted = False     self.d[k] = v   def __delitem__(self, k):     del self.d[k]     self.L.remove(k)   def __iter__(self):     if not self.sorted:       self.L.sort()       self.sorted = True     return iter(self.L) 

If you don't like the "amortized O(N) order" you can remove self.sorted and just repeat self.L.sort() in __setitem__ itself. That makes writes O(N log N), of course (while I still had writes at O(1)). Either approach is viable and it's hard to think of one as intrinsically superior to the other. If you tend to do a bunch of writes then a bunch of iterations then the approach in the code above is best; if it's typically one write, one iteration, another write, another iteration, then it's just about a wash.

BTW, this takes shameless advantage of the unusual (and wonderful;-) performance characteristics of Python's sort (aka "timsort"): among them, sorting a list that's mostly sorted but with a few extra items tacked on at the end is basically O(N) (if the tacked on items are few enough compared to the sorted prefix part). I hear Java's gaining this sort soon, as Josh Block was so impressed by a tech talk on Python's sort that he started coding it for the JVM on his laptop then and there. Most sytems (including I believe Jython as of today and IronPython too) basically have sorting as an O(N log N) operation, not taking advantage of "mostly ordered" inputs; "natural mergesort", which Tim Peters fashioned into Python's timsort of today, is a wonder in this respect.

like image 177
Alex Martelli Avatar answered Oct 06 '22 14:10

Alex Martelli