Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I implement a dict with Abstract Base Classes in Python? [duplicate]

I attempted to implement a mapping in Python by using the abstract base class, MutableMapping, but I got an error on instantiation. How would I go about making a working version of this dictionary that would emulate the builtin dict class, in as many ways as possible, to be clear, with Abstract Base Classes?

>>> class D(collections.MutableMapping):
...     pass
... 
>>> d = D()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class D with abstract methods __delitem__, __getitem__, __iter__, __len__, __setitem__

A good answer will demonstrate how to make this work, specifically without subclassing dict (a concept that I am quite familiar with).

like image 690
Russia Must Remove Putin Avatar asked Jan 26 '14 07:01

Russia Must Remove Putin


2 Answers

How would I implement a dict with Abstract Base Classes?

A good answer will demonstrate how to make this work, specifically without subclassing dict.

Here's the error message: TypeError: Can't instantiate abstract class D with abstract methods __delitem__, __getitem__, __iter__, __len__, __setitem__

It turns out that one must implement them to use the Abstract Base Class (ABC), MutableMapping.

Implementation

So I implement a mapping that works like a dict in most respects that uses the object's attribute reference dict for the mapping. (Delegation is not the same as inheritance, so we'll just delegate to the instance __dict__, we could use any other ad-hoc mapping, but you don't seem to care about that part of the implementation. It makes sense to do it this way in Python 2, because MutableMapping doesn't have __slots__ in Python 2, so you're creating a __dict__ either way. In Python 3, you could avoid dicts altogether by setting __slots__.)

from collections.abc import MutableMapping

class D(MutableMapping):
    '''
    Mapping that works like both a dict and a mutable object, i.e.
    d = D(foo='bar')
    and 
    d.foo returns 'bar'
    '''
    # ``__init__`` method required to create instance from class.
    def __init__(self, *args, **kwargs):
        '''Use the object dict'''
        self.__dict__.update(*args, **kwargs)
    # The next five methods are requirements of the ABC.
    def __setitem__(self, key, value):
        self.__dict__[key] = value
    def __getitem__(self, key):
        return self.__dict__[key]
    def __delitem__(self, key):
        del self.__dict__[key]
    def __iter__(self):
        return iter(self.__dict__)
    def __len__(self):
        return len(self.__dict__)
    # The final two methods aren't required, but nice for demo purposes:
    def __str__(self):
        '''returns simple dict representation of the mapping'''
        return str(self.__dict__)
    def __repr__(self):
        '''echoes class, id, & reproducible representation in the REPL'''
        return '{}, D({})'.format(super(D, self).__repr__(), 
                                  self.__dict__)

Demonstration

And to demonstrate the usage:

>>> d = D((e, i) for i, e in enumerate('abc'))
>>> d
<__main__.D object at 0x7f75eb242e50>, D({'b': 1, 'c': 2, 'a': 0})
>>> d.a
0
>>> d.get('b')
1
>>> d.setdefault('d', []).append(3)
>>> d.foo = 'bar'
>>> print(d)
{'b': 1, 'c': 2, 'a': 0, 'foo': 'bar', 'd': [3]}

And for ensuring the dict API, lesson learned is that you can always check for collections.abc.MutableMapping:

>>> isinstance(d, MutableMapping)
True
>>> isinstance(dict(), MutableMapping)
True

And while a dict is always going to be an instance of a MutableMapping due to registration on collections import, the reverse is not always true:

>>> isinstance(d, dict)
False
>>> isinstance(d, (dict, MutableMapping))
True

After performing this exercise, it is clear to me that using Abstract Base Classes provides only the guarantee of a standard API for users of the class. In this case, users assuming a MutableMapping object will be guaranteed the standard API for Python.

Caveats:

The fromkeys class constructor method is not implemented.

>>> dict.fromkeys('abc')
{'b': None, 'c': None, 'a': None}
>>> D.fromkeys('abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'D' has no attribute 'fromkeys'

One could mask the builtin dict methods like get or setdefault

>>> d['get'] = 'baz'
>>> d.get('get')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

It's fairly simple to unmask again:

>>> del d['get']
>>> d.get('get', 'Not there, but working')
'Not there, but working'

But I wouldn't use this code in production.


Demonstration without a dict, Python 3:

>>> class MM(MutableMapping):
...   __delitem__, __getitem__, __iter__, __len__, __setitem__ = (None,) *5
...   __slots__ = ()
...
>>> MM().__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MM' object has no attribute '__dict__'
like image 199
Russia Must Remove Putin Avatar answered Sep 21 '22 17:09

Russia Must Remove Putin


The best way to demonstrate this without actually using a dict anywhere is probably to implement something dead simple, very different from dict, and not completely useless. Like a fixed-sized mapping of fixed-size bytes to same-fixed-size bytes. (You might use this for, e.g., a routing table—it'll be much more compact than a dict mapping unpacked keys to unpacked values, although obviously at the cost of speed and flexibility.)

A hash table is just an array of (hash, key, value) tuples. Since the whole point of this is packing data in, we cram those into a struct, meaning we can just use a big bytearray for storage. To mark a slot empty, we set its hash value to 0—which means we need to "escape" any real 0 by turning it into a 1, which is stupid, but simpler to code. We'll also use the dumbest possible probe algorithm for simplicity.

class FixedHashTable(object):
    hashsize = 8
    def __init__(self, elementsize, size):
        self.elementsize = elementsize
        self.size = size
        self.entrysize = self.hashsize + self.elementsize * 2
        self.format = 'q{}s{}s'.format(self.elementsize, self.elementsize)
        assert struct.calcsize(self.format) == self.entrysize
        self.zero = b'\0' * self.elementsize
        self.store = bytearray(struct.pack(self.format, 0,
                                           self.zero, self.zero)
                               ) * self.size
    def hash(self, k):
        return hash(k) or 1
    def stash(self, i, h, k, v):
        entry = struct.pack(self.format, h, k, v)
        self.store[i*self.entrysize:(i+1)*self.entrysize] = entry
    def fetch(self, i):
        entry = self.store[i*self.entrysize:(i+1)*self.entrysize]
        return struct.unpack(self.format, entry)
    def probe(self, keyhash):
        i = keyhash % self.size
        while True:
            h, k, v = self.fetch(i)
            yield i, h, k, v
            i = (i + 1) % self.size
            if i == keyhash % self.size:
                break

As the error message says, you need to provide implementations for the abstract methods __delitem__, __getitem__, __iter__, __len__, and __setitem__. However, a better place to look is the docs, which will tell you that if you implement those five methods (plus any other methods required by the base classes, but as you can see from the table there are none), you'll get all the other methods for free. You may not get the most efficient possible implementations of all of them, but we'll come back to that.

First, let's deal with __len__. Normally people expect this to be O(1), which means we need to keep track of it independently, updating it as needed. So:

class FixedDict(collections.abc.MutableMapping):
    def __init__(self, elementsize, size):
        self.hashtable = FixedHashTable(elementsize, size)
        self.len = 0

Now, __getitem__ just probes until it finds the desired key or reaches the end:

    def __getitem__(self, key):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if h and k == key:
                return v

And __delitem__ does the same thing, except it empties out the slot if found, and updates len.

    def __delitem__(self, key):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if h and k == key:
                self.hashtable.stash(i, 0, self.hashtable.zero, self.hashtable.zero)
                self.len -= 1
                return
        raise KeyError(key)

__setitem__ is a bit trickier—if found, we have to replace the value in the slot; if not, we have to fill an empty slot. And here we have to deal with the fact that the hash table may be full. And of course we have to take care of len:

    def __setitem__(self, key, value):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if not h or k == key:
                if not h:
                    self.len += 1
                self.hashtable.stash(i, keyhash, key, value)
                return
        raise ValueError('hash table full')

And that leaves __iter__. Just as with a dict, we don't have any particular order, so we can just iterate the hash table slots and yield all the non-empty ones:

def __iter__(self):
    return (k for (h, k, v) in self.hashtable.fetch(i)
            for i in range(self.hashtable.size) if h)

While we're at it, we might as well write a __repr__. Note that we can use the fact that we get items for free:

def __repr__(self):
    return '{}({})'.format(type(self).__name__, dict(self.items()))

However, note that the default items just creates an ItemsView(self), and if you track that through the source, you'll see that it iterates self and looks up each value. You can obviously do better if the performance matters:

def items(self):
    pairs = ((k, v) for (h, k, v) in self.hashtable.fetch(i)
             for i in range(self.hashtable.size) if h)
    return collections.abc.ItemsView._from_iterable(pairs)

And likewise for values, and possibly other methods.

like image 34
abarnert Avatar answered Sep 18 '22 17:09

abarnert