Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does overriding __contains__ break OrderedDict.keys?

I'm subclasssing OrderedDict (Cpython, 2.7.3) to represent a datafile. __getitem__ pulls a field out of the datafile and sets it on the current instance similar to the code I've posted below. now I would like to override __contains__ to return True if the field is in the dictionary or in the file on the disk since it can be read either way. However, this seems to break OrderedDict's ability to inspect it's keys.

from collections import OrderedDict

dictclass = OrderedDict

class Foo(dictclass):
    def __getitem__(self,key):
        try:
            return dictclass.__getitem__(self,key)
        except KeyError:
            pass

        data = key*2
        self[key] = data
        return data

    def __contains__(self,whatever):
        return dictclass.__contains__(self,whatever) or 'bar' in whatever

a = Foo()
print a['bar']
print a.keys()

If you run the code above, you'll get this output:

barbar
[]

Note that if you change dictclass = dict in the above code, it still seems to work (giving the following output).

barbar
['bar']

Am I doing something horribly wrong?

like image 459
mgilson Avatar asked Mar 09 '13 22:03

mgilson


2 Answers

When Foo.__contains__ is not defined:

a['bar']

calls Foo.__getitem__, which executes

    self[key] = data

This calls OrderedDict.__setitem__, which is defined this way:

def __setitem__(self, key, value, PREV=0, NEXT=1, dict_setitem=dict.__setitem__):
    'od.__setitem__(i, y) <==> od[i]=y'
    # Setting a new item creates a new link at the end of the linked list,
    # and the inherited dictionary is updated with the new key/value pair.
    if key not in self:
        root = self.__root
        last = root[PREV]
        last[NEXT] = root[PREV] = self.__map[key] = [last, root, key]
    dict_setitem(self, key, value)

Since Foo.__contains__ is not defined,

    if key not in self:

is True. So the key is properly added to self.__root and self.__map.

When Foo.__contains__ is defined,

    if key not in self:

if False. So the key is not properly added to self.__root and self.__map. Foo.__contains__ effective fools OrderedDict.__setitem__ into thinking that the 'bar' key has already been added.


I found it helpful to play with the following code (adding print statements in __setitem__ and __iter__):

from collections import OrderedDict

dictclass = OrderedDict

class Foo(dictclass):
    def __getitem__(self,key):
        try:
            return dictclass.__getitem__(self,key)
        except KeyError:
            pass

        data = key*2
        self[key] = data
        return data

    def __contains__(self,whatever):
        print('contains: {}'.format(whatever))
        return dictclass.__contains__(self,whatever) or 'bar' in whatever

    def __setitem__(self, key, value, PREV=0, NEXT=1, dict_setitem=dict.__setitem__):
        'od.__setitem__(i, y) <==> od[i]=y'
        # Setting a new item creates a new link at the end of the linked list,
        # and the inherited dictionary is updated with the new key/value pair.
        print('key not in self: {}'.format(key not in self))
        if key not in self:
            root = self._OrderedDict__root
            last = root[PREV]
            last[NEXT] = root[PREV] = self._OrderedDict__map[key] = [last, root, key]
        dict_setitem(self, key, value)

    def __iter__(self):
        'od.__iter__() <==> iter(od)'
        # Traverse the linked list in order.
        NEXT, KEY = 1, 2

        root = self._OrderedDict__root
        curr = root[NEXT]
        print('curr: {}'.format(curr))
        print('root: {}'.format(root)) 
        print('curr is not root: {}'.format(curr is not root))

        while curr is not root:
            yield curr[KEY]
            curr = curr[NEXT]

a = Foo()
print a['bar']
# barbar

print a.keys()
# ['bar']

Notice that you can avoid this problem by making Foo a subclass of collections.MutableMapping and delegating most of its behavior to a OrderedDict attribute:

import collections
dictclass = collections.OrderedDict

class Foo(collections.MutableMapping):
    def __init__(self, *args, **kwargs):
        self._data = dictclass(*args, **kwargs)
    def __setitem__(self, key, value):
        self._data[key] = value
    def __delitem__(self, key):
        del self._data[key]
    def __iter__(self):
        return iter(self._data)
    def __len__(self):
        return len(self._data)

    def __getitem__(self,key):
        try:
            return self._data[key]
        except KeyError:
            pass

        data = key*2
        self[key] = data
        return data

    def __contains__(self,whatever):
        return dictclass.__contains__(self,whatever) or 'bar' in whatever

which yields

a = Foo()
print a['bar']
# barbar

print a.keys()
# ['bar']

even with __contains__ defined.

like image 154
unutbu Avatar answered Sep 28 '22 02:09

unutbu


What breaks your code is the or 'bar' in whatever. If you remove it, it will work as with the change dictclass = dict you mention.

The __setitem__ implementation of OrderedDict is this:

def __setitem__(self, key, value, dict_setitem=dict.__setitem__):
    'od.__setitem__(i, y) <==> od[i]=y'
    # Setting a new item creates a new link at the end of the linked list,
    # and the inherited dictionary is updated with the new key/value pair.
    if key not in self:
        root = self.__root
        last = root[0]
        last[1] = root[0] = self.__map[key] = [last, root, key]
    return dict_setitem(self, key, value)

So with self["bar"] = "barbar", the condition should be False, but it is True even before inserting any item. Thus, the key isn' added to self.__root which is used in OrderedDict.__iter__:

def __iter__(self):
    'od.__iter__() <==> iter(od)'
    # Traverse the linked list in order.
    root = self.__root
    curr = root[1]                                  # start at the first node
    while curr is not root:
        yield curr[2]                               # yield the curr[KEY]
        curr = curr[1]                              # move to next node

Since the code for retrieving the values uses this iterator and self.__root does not contain "bar", this concrete key cannot be returned in the values.

like image 26
A. Rodas Avatar answered Sep 28 '22 02:09

A. Rodas