I've been reading into how super()
works. I came across this recipe that demonstrates how to create an Ordered Counter:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first seen'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__,
OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
For example:
oc = OrderedCounter('adddddbracadabra')
print(oc)
OrderedCounter(OrderedDict([('a', 5), ('d', 6), ('b', 2), ('r', 2), ('c', 1)]))
Is someone able to explain how this magically works?
This also appears in the Python documentation.
When an instance of an OrderedDict is calling __setitem__() , it searches the classes in order: OrderedCounter , Counter , OrderedDict (where it is found). So an statement like oc['a'] = 0 ends up calling OrderedDict.
Counter is a subclass of dict that's specially designed for counting hashable objects in Python. It's a dictionary that stores objects as keys and counts as values. To count with Counter , you typically provide a sequence or iterable of hashable objects as an argument to the class's constructor.
Counter is an unordered collection where elements are stored as Dict keys and their count as dict value. Counter elements count can be positive, zero or negative integers. However there is no restriction on it's keys and values.
OrderedCounter is given as an example in the OrderedDict documentation, and works without needing to override any methods:
class OrderedCounter(Counter, OrderedDict):
pass
When a class method is called, Python has to find the correct method to execute. There is a defined order in which it searches the class hierarchy called the "method resolution order" or mro. The mro is stored in the attribute __mro__
:
OrderedCounter.__mro__
(<class '__main__.OrderedCounter'>, <class 'collections.Counter'>, <class 'collections.OrderedDict'>, <class 'dict'>, <class 'object'>)
When an instance of an OrderedDict is calling __setitem__()
, it searches the classes in order: OrderedCounter
, Counter
, OrderedDict
(where it is found). So an statement like oc['a'] = 0
ends up calling OrderedDict.__setitem__()
.
In contrast, __getitem__
is not overridden by any of the subclasses in the mro, so count = oc['a']
is handled by dict.__getitem__()
.
oc = OrderedCounter()
oc['a'] = 1 # this call uses OrderedDict.__setitem__
count = oc['a'] # this call uses dict.__getitem__
A more interesting call sequence occurs for a statement like oc.update('foobar').
First, Counter.update()
gets called. The code for Counter.update()
uses self[elem], which gets turned into a call to OrderedDict.__setitem__()
. And the code for that calls dict.__setitem__()
.
If the base classes are reversed, it no longer works. Because the mro is different and the wrong methods get called.
class OrderedCounter(OrderedDict, Counter): # <<<== doesn't work
pass
More info on mro can be found in the Python 2.3 documentation.
I think we need to represent those methods repr
and reduce
in the class when words are given as input.
Without repr
and reduce
:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
oc = OrderedCounter(['apple', 'banana', 'cherry', 'mango', 'apple', 'pie', 'mango'])
print(oc)
Output:
OrderedCounter({'apple': 2, 'mango': 2, 'banana': 1, 'cherry': 1, 'pie': 1})
The order in the above example is not preserved.
With repr
and reduce
:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
oc = OrderedCounter(['apple', 'banana', 'cherry', 'mango', 'apple', 'pie', 'mango'])
print(oc)
Output:
OrderedCounter(OrderedDict([('apple', 2), ('banana', 1), ('cherry', 1), ('mango', 2), ('pie', 1)]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With