Suppose I have a dictionary
{1:5, 2:5, 4:5}
Is there a data structure such that if I add the key-value pair 3:5
, to have it entered in the dictionary so that the keys are in sorted order? i.e.
{1:5, 2:5, 3:5, 4:5}
I am aware of the collections.OrderedDict()
, but this only keeps the keys in the order in which they were added (which isn't sufficient for me currently).
I don't want to have to use a normal dictionary dic = {}
, then have to use sorted(dic)[0]
to grab the smallest key. I'd rather have sorted_dict[0]
type function.
The reason for this is that if I use a normal dictionary, I will have to call the sort multiple times, as I am continuously adding pairs to my dictionary.
EDIT: I should have mentioned, it's not only the smallest and largest keys I care about, I will also need to print this dictionary at regular intervals as well...
Need for Sorting in Dictionary First, sort the keys alphabetically using key_value.iterkeys() function. Second, sort the keys alphabetically using the sorted (key_value) function & print the value corresponding to it. Third, sort the values alphabetically using key_value.iteritems(), key = lambda (k, v) : (v, k))
Dictionary is an important data structure that stores data by mapping keys with values. The default dictionaries in Python are unordered data structures. Like lists, we can use the sorted() function to sort the dictionary by keys. However, it will only return a list of sorted keys, which is usually not what we desire.
If you plan to add and remove keys from the dictionary continuously, you really want something that uses an appropriate data structure for the problem—not a hash table (or a hash table plus a list, as with the SortedOrderedDict
-type recipes), but a balanced tree (or an equivalent, like a skip list).
If you look around at PyPI, you'll find a number of options. My recommendation would be blist
. Even though its data structure may not be quite as optimal as some of the others (because a B+Tree is much broader than a binary tree), it's probably good enough for almost any use case you will run into it. And it has a complete and well-tested interface, including well-tested performance guarantees. And it's used quite a bit by other serious projects.
If you're dealing with one of the rare cases where the tree performance really is critical, you should probably look at the various red-black tree, splay tree, skiplist, etc. implementations. I've used bintrees
before, which has a great interface (e.g., you can access the keys and values by index, and even slice the tree, as well as treating it like a dict
, and the author has thought through and avoided all the potential ambiguities), but I haven't seriously performance tested it.
Or, if your keys and values really are all smallish integers, you might want to consider using Cython to wrap a C++ map<int, int>
in a Pythonic interface. (It's not quite possible to provide a complete interface on top of C++ map
, but you often don't need that anyway.) Or, alternatively, modify one of the implementations like bintrees.FastRBTree
to store and compare long
instead of PyObject*
.
On the other hand, if you're just going to create the dictionary all at once and then use it, there's a much simpler answer. Sort it, and stick it in an OrderedDict
. Then you don't need anything outside the stdlib.
sorted_dict = collections.OrderedDict(sorted(d.iteritems()))
From a comment on another answer, you say "i don't have permissions to install new modules..."
First, make sure that's really true. You probably do have permission to install modules in a user site-packages directory. Or, if virtualenv
is installed and/or you're using 3.3 with built-in venv
, even better, you probably have permission to create a venv and install modules into that.
But if so, what you need to do is copy the files from blist
/bintrees
/whatever into your project.
The problem you might run into is that most of these packages contain C extension modules, which means you have to be able to build them (well, build_ext -i
them). If your system doesn't have the Python dev files and a compiler tool chain installed, you can't do that. In that case, you're looking for the best pure-Python solution. bintrees
comes with a pure-Python implementation that's identical to the normal C-extension implementation, except slower. It's still O(log N) of course, just the constant factor is a lot higher. If N is big enough, it's still a huge win; if not, it may not be.
If any part of this sounds reasonable, but you need help setting up a per-user site-packages or virtual env, or copying a module into your project in-place, or building extensions in-place, etc., you should probably search for existing questions, and ask a new one if you can't find one (if for no other reason than because the kinds of people who are experts at installation issues aren't necessarily experts at data structures, and may not even be reading this question).
Try this recipe — http://code.activestate.com/recipes/576998-sorted-dictionary/
It keeps keys sorted using stdlib bisect module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With