Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you slice with string keys instead of integers on a python OrderedDict?

Since an OrderedDict has the features of both a list (with ordered elements), and a dictionary (with keys instead of indexes), it would seem natural that you could slice using keys.

>>> from collections import OrderedDict
>>> cities = OrderedDict((('san francisco', 650), ('new york', 212), ('shanghai', 8621), ('barcelona', 42423)))
>>> test['shanghai':]  # I want all the cities from shanghai to the end of the list
TypeError: unhashable type

What's interesting about this is that it's not the error you'd see due to OrderedDictionary.__getslice__ not being implemented. I tried adding my own __getslice__ method to OrderedDict, but I keep running into this TypeError problem. It seems like Python is doing some kind of type checking to enforce that slice keys are only integers, before they even get passed to the __getslice__ function, how unpythonic!

>>> class BetterOrderedDict(OrderedDict):
        def __getslice__(self, start=None, end=None, step=1):
            return 'potato'

>>> test = BetterOrderedDict((('one', 1), ('two', 2), ('three', 3), ('four', 4)))
>>> print test[1:4]
'potato'                           # ok this makes sense so far

>>> test['one':'four']
TypeError: unhashable type         # WTF, strings are hashable!

So my question is, why can't I implement non-int slices, what kind of type-checking is preventing the slice keys from even reaching my __getslice__ function, and can I override it by implementing my BetterOrderedDict in C with bindings?

like image 914
Nick Sweeting Avatar asked Jan 12 '15 23:01

Nick Sweeting


People also ask

What is OrderedDict in Python?

Python's OrderedDict is a dict subclass that preserves the order in which key-value pairs, commonly known as items, are inserted into the dictionary. When you iterate over an OrderedDict object, items are traversed in the original order. If you update the value of an existing key, then the order remains unchanged.

Do dictionaries support slicing Python?

With Python, we can easily slice a dictionary to get just the key/value pairs we want. To slice a dictionary, you can use dictionary comprehension. In Python, dictionaries are a collection of key/value pairs separated by commas.

Are dictionary Keys ordered Python?

Answer. No, there is no guaranteed order for the list of keys returned by the keys() function. In most cases, the key list is returned in the same order as the insertion, however, that behavior is NOT guaranteed and should not be depended on by your program.

How do I create an OrderedDict?

OrderedDict is part of python collections module. We can create an empty OrderedDict and add items to it. If we create an OrderedDict by passing a dict argument, then the ordering may be lost because dict doesn't maintain the insertion order. If an item is overwritten in the OrderedDict, it's position is maintained.


3 Answers

__getslice__ is deprecated way of implementing slicing. Instead you should handle slice objects with __getitem__:

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            return 'potato({},{},{})'.format(key.start, key.stop, key.step)
        return super(SlicableDict, self).__getitem__(key)

>>> s = SlicableDict(a=1, b=2, c=3)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2)])
>>> s['a']
1
>>> s['a':'c']
'potato(a,c,None)'

And if you need more than potato, than you can implement all three slicing operations following way:

def _key_slice_to_index_slice(items, key_slice):
    try:
        if key_slice.start is None:
            start = None
        else:
            start = next(idx for idx, (key, value) in enumerate(items)
                         if key == key_slice.start)
        if key_slice.stop is None:
            stop = None
        else:
            stop = next(idx for idx, (key, value) in enumerate(items)
                        if key == key_slice.stop)
    except StopIteration:
        raise KeyError
    return slice(start, stop, key_slice.step)

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            return SlicableDict(items[index_slice])
        return super(SlicableDict, self).__getitem__(key)

    def __setitem__(self, key, value):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            items[index_slice] = value.items()
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__setitem__(key, value)

    def __delitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            del items[index_slice]
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__delitem__(key)
like image 150
zch Avatar answered Oct 02 '22 16:10

zch


This is the actual implementation of the slicing feature you are expecting.

OrderedDict internally maintains the order of the keys in the form of a doubly linked list. Quoting the actual comment from Python 2.7.9,

# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# Each link is stored as a list of length three:  [PREV, NEXT, KEY].

Now, to slice the dictionary, we need to iterate the doubly linked list, __root, which is actually a private variable, protected by the name mangling mechanism.

Note: This involves hacky name unmangling to use the OrderedDict's internal data structures.

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            # Unmangle `__root` to access the doubly linked list
            root = getattr(self, "_OrderedDict__root")
            # By default, make `start` as the first element, `end` as the last
            start, end = root[1][2], root[0][2]
            start = key.start or start
            end = key.stop or end
            step = key.step or 1
            curr, result, begun, counter = root[1], [], False, 0

            # Begin iterating
            curr, result, begun = root[1], [], False
            while curr is not root:
                # If the end value is reached, `break` and `return`
                if curr[2] == end:
                    break
                # If starting value is matched, start appending to `result`
                if curr[2] == start:
                    begun = True
                if begun:
                    if counter % step == 0:
                        result.append((curr[2], self[curr[2]]))
                    counter += 1

                # Make the `curr` point to the next element
                curr = curr[1]

            return result

        return super(SlicableDict, self).__getitem__(key)

Few sample runs:

>>> s = SlicableDict(a=1, b=2, c=3, d=4)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4), ('f', 6)])
>>> s['a':'c']
[('a', 1)]
>>> s['a':]
[('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4)]
>>> s[:'a']
[]
>>> s['a':'f':2]
[('a', 1), ('b', 2), ('d', 4)]
like image 28
thefourtheye Avatar answered Oct 02 '22 16:10

thefourtheye


Try this (very ugly) implementation

class SliceOrdered(OrderedDict):

    def __getitem__(self, key):
        if isinstance(key, slice):
            tmp = OrderedDict()
            i_self = iter(self)
            for k in i_self:
                if key.start <= k <= key.stop:
                    tmp[k] = self[k]
                    if key.step is not None and key.step > 1:
                        for _ in range(key.step-1):
                            try:
                                next(i_self)
                            except StopIteration:
                                break
            return tmp
        else:
            return super(SliceOrdered, self).__getitem__(key)

DEMO (Python3.4)

>>> s = SliceOrdered([('a',2), ('b',2), ('c',3), ('d',4)])
>>> s['a':'c']
OrderedDict([('a', 2), ('b', 2), ('c', 3)])
>>> s['a':'d':2]
OrderedDict([('a', 2), ('c', 3)])

N.B. this probably only works because in this example, the OrderedDict was not only ordered, but also sorted. In an unsorted dictionary the slice 'a':'c' does not necessary contain 'b', so my if key.start <= k <= key.stop logic probably fails. The following code should respect that:

class SliceOrdered(OrderedDict):
    def __getitem__(self, key):
        if not isinstance(key, slice):
            return super(SliceOrdered,self).__getitem__(key)
        tmp = OrderedDict()
        step = key.step or 1
        accumulating = False
        i_self = iter(self)
        for k in i_self:
            if k == key.start:
                accumulating = True
            if accumulating:
                tmp[k] = self[k]
                for _ in range(step-1):
                    next(i_self)
            if k == key.stop:
                accumulating = False
                break
        return tmp
like image 45
Adam Smith Avatar answered Sep 29 '22 16:09

Adam Smith