To override a dict with Python, we can create a subclass of the MutableMapping class. to create a TransformedDict class that is a subclass of the MutableMapping . We use a dict as the value of the store instance variable.
__dict__ in Python represents a dictionary or any mapping object that is used to store the attributes of the object. They are also known as mappingproxy objects.
Yes you can, dictionary is an mutable object so they can be modified within functions, but it must be defined before you actually call the function.
Method 2: Delete all Keys from a Dictionary using dict. clear() The clear() method removes all items from the dictionary. The clear() method doesn't return any value.
You can write an object that behaves like a dict
quite easily with ABCs (Abstract Base Classes) from the collections.abc
module. It even tells you if you missed a method, so below is the minimal version that shuts the ABC up.
from collections.abc import MutableMapping
class TransformedDict(MutableMapping):
"""A dictionary that applies an arbitrary key-altering
function before accessing the keys"""
def __init__(self, *args, **kwargs):
self.store = dict()
self.update(dict(*args, **kwargs)) # use the free update to set keys
def __getitem__(self, key):
return self.store[self._keytransform(key)]
def __setitem__(self, key, value):
self.store[self._keytransform(key)] = value
def __delitem__(self, key):
del self.store[self._keytransform(key)]
def __iter__(self):
return iter(self.store)
def __len__(self):
return len(self.store)
def _keytransform(self, key):
return key
You get a few free methods from the ABC:
class MyTransformedDict(TransformedDict):
def _keytransform(self, key):
return key.lower()
s = MyTransformedDict([('Test', 'test')])
assert s.get('TEST') is s['test'] # free get
assert 'TeSt' in s # free __contains__
# free setdefault, __eq__, and so on
import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s
I wouldn't subclass dict
(or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict
. And that is exactly what ABCs are for.
How can I make as "perfect" a subclass of dict as possible?
The end goal is to have a simple dict in which the keys are lowercase.
If I override
__getitem__
/__setitem__
, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?Am I preventing pickling from working, and do I need to implement
__setstate__
etc?Do I need repr, update and
__init__
?Should I just use
mutablemapping
(it seems one shouldn't useUserDict
orDictMixin
)? If so, how? The docs aren't exactly enlightening.
The accepted answer would be my first approach, but since it has some issues,
and since no one has addressed the alternative, actually subclassing a dict
, I'm going to do that here.
This seems like a rather simple request to me:
How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.
The accepted answer doesn't actually subclass dict
, and a test for this fails:
>>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
False
Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for dict
- and we can't "fix" those functions, this code will fail.
Other quibbles one might make:
fromkeys
. The accepted answer also has a redundant __dict__
- therefore taking up more space in memory:
>>> s.foo = 'bar'
>>> s.__dict__
{'foo': 'bar', 'store': {'test': 'test'}}
dict
We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.
If I override
__getitem__
/__setitem__
, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?
Well, implementing them each individually is the downside to this approach and the upside to using MutableMapping
(see the accepted answer), but it's really not that much more work.
First, let's factor out the difference between Python 2 and 3, create a singleton (_RaiseKeyError
) to make sure we know if we actually get an argument to dict.pop
, and create a function to ensure our string keys are lowercase:
from itertools import chain
try: # Python 2
str_base = basestring
items = 'iteritems'
except NameError: # Python 3
str_base = str, bytes, bytearray
items = 'items'
_RaiseKeyError = object() # singleton for no-default behavior
def ensure_lower(maybe_str):
"""dict keys can be any hashable object - only call lower if str"""
return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str
Now we implement - I'm using super
with the full arguments so that this code works for Python 2 and 3:
class LowerDict(dict): # dicts take a mapping or iterable as their optional first argument
__slots__ = () # no __dict__ - that would be redundant
@staticmethod # because this doesn't make sense as a global function.
def _process_args(mapping=(), **kwargs):
if hasattr(mapping, items):
mapping = getattr(mapping, items)()
return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
def __init__(self, mapping=(), **kwargs):
super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
def __getitem__(self, k):
return super(LowerDict, self).__getitem__(ensure_lower(k))
def __setitem__(self, k, v):
return super(LowerDict, self).__setitem__(ensure_lower(k), v)
def __delitem__(self, k):
return super(LowerDict, self).__delitem__(ensure_lower(k))
def get(self, k, default=None):
return super(LowerDict, self).get(ensure_lower(k), default)
def setdefault(self, k, default=None):
return super(LowerDict, self).setdefault(ensure_lower(k), default)
def pop(self, k, v=_RaiseKeyError):
if v is _RaiseKeyError:
return super(LowerDict, self).pop(ensure_lower(k))
return super(LowerDict, self).pop(ensure_lower(k), v)
def update(self, mapping=(), **kwargs):
super(LowerDict, self).update(self._process_args(mapping, **kwargs))
def __contains__(self, k):
return super(LowerDict, self).__contains__(ensure_lower(k))
def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
return type(self)(self)
@classmethod
def fromkeys(cls, keys, v=None):
return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
def __repr__(self):
return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())
We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods: len
, clear
, items
, keys
, popitem
, and values
for free. While this required some careful thought to get right, it is trivial to see that this works.
(Note that haskey
was deprecated in Python 2, removed in Python 3.)
Here's some usage:
>>> ld = LowerDict(dict(foo='bar'))
>>> ld['FOO']
'bar'
>>> ld['foo']
'bar'
>>> ld.pop('FoO')
'bar'
>>> ld.setdefault('Foo')
>>> ld
{'foo': None}
>>> ld.get('Bar')
>>> ld.setdefault('Bar')
>>> ld
{'bar': None, 'foo': None}
>>> ld.popitem()
('bar', None)
Am I preventing pickling from working, and do I need to implement
__setstate__
etc?
And the dict subclass pickles just fine:
>>> import pickle
>>> pickle.dumps(ld)
b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
>>> pickle.loads(pickle.dumps(ld))
{'foo': None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class '__main__.LowerDict'>
__repr__
Do I need repr, update and
__init__
?
We defined update
and __init__
, but you have a beautiful __repr__
by default:
>>> ld # without __repr__ defined for the class, we get this
{'foo': None}
However, it's good to write a __repr__
to improve the debugability of your code. The ideal test is eval(repr(obj)) == obj
. If it's easy to do for your code, I strongly recommend it:
>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True
You see, it's exactly what we need to recreate an equivalent object - this is something that might show up in our logs or in backtraces:
>>> ld
LowerDict({'a': 1, 'c': 3, 'b': 2})
Should I just use
mutablemapping
(it seems one shouldn't useUserDict
orDictMixin
)? If so, how? The docs aren't exactly enlightening.
Yeah, these are a few more lines of code, but they're intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I'd then look at my answer - as it's a little more complicated, and there's no ABC to help me get my interface right.
Premature optimization is going for greater complexity in search of performance.
MutableMapping
is simpler - so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let's compare and contrast.
I should add that there was a push to put a similar dictionary into the collections
module, but it was rejected. You should probably just do this instead:
my_dict[transform(key)]
It should be far more easily debugable.
There are 6 interface functions implemented with the MutableMapping
(which is missing fromkeys
) and 11 with the dict
subclass. I don't need to implement __iter__
or __len__
, but instead I have to implement get
, setdefault
, pop
, update
, copy
, __contains__
, and fromkeys
- but these are fairly trivial, since I can use inheritance for most of those implementations.
The MutableMapping
implements some things in Python that dict
implements in C - so I would expect a dict
subclass to be more performant in some cases.
We get a free __eq__
in both approaches - both of which assume equality only if another dict is all lowercase - but again, I think the dict
subclass will compare more quickly.
MutableMapping
is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and fails isinstance(x, dict)
dict
is faster, uses less memory, and passes isinstance(x, dict)
, but it has greater complexity to implement.Which is more perfect? That depends on your definition of perfect.
My requirements were a bit stricter:
My initial thought was to substitute our clunky Path class for a case insensitive unicode subclass - but:
some_dict[CIstr(path)]
is ugly)So I had finally to write down that case insensitive dict. Thanks to code by @AaronHall that was made 10 times easier.
class CIstr(unicode):
"""See https://stackoverflow.com/a/43122305/281545, especially for inlines"""
__slots__ = () # does make a difference in memory performance
#--Hash/Compare
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
if isinstance(other, CIstr):
return self.lower() == other.lower()
return NotImplemented
def __ne__(self, other):
if isinstance(other, CIstr):
return self.lower() != other.lower()
return NotImplemented
def __lt__(self, other):
if isinstance(other, CIstr):
return self.lower() < other.lower()
return NotImplemented
def __ge__(self, other):
if isinstance(other, CIstr):
return self.lower() >= other.lower()
return NotImplemented
def __gt__(self, other):
if isinstance(other, CIstr):
return self.lower() > other.lower()
return NotImplemented
def __le__(self, other):
if isinstance(other, CIstr):
return self.lower() <= other.lower()
return NotImplemented
#--repr
def __repr__(self):
return '{0}({1})'.format(type(self).__name__,
super(CIstr, self).__repr__())
def _ci_str(maybe_str):
"""dict keys can be any hashable object - only call CIstr if str"""
return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str
class LowerDict(dict):
"""Dictionary that transforms its keys to CIstr instances.
Adapted from: https://stackoverflow.com/a/39375731/281545
"""
__slots__ = () # no __dict__ - that would be redundant
@staticmethod # because this doesn't make sense as a global function.
def _process_args(mapping=(), **kwargs):
if hasattr(mapping, 'iteritems'):
mapping = getattr(mapping, 'iteritems')()
return ((_ci_str(k), v) for k, v in
chain(mapping, getattr(kwargs, 'iteritems')()))
def __init__(self, mapping=(), **kwargs):
# dicts take a mapping or iterable as their optional first argument
super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
def __getitem__(self, k):
return super(LowerDict, self).__getitem__(_ci_str(k))
def __setitem__(self, k, v):
return super(LowerDict, self).__setitem__(_ci_str(k), v)
def __delitem__(self, k):
return super(LowerDict, self).__delitem__(_ci_str(k))
def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
return type(self)(self)
def get(self, k, default=None):
return super(LowerDict, self).get(_ci_str(k), default)
def setdefault(self, k, default=None):
return super(LowerDict, self).setdefault(_ci_str(k), default)
__no_default = object()
def pop(self, k, v=__no_default):
if v is LowerDict.__no_default:
# super will raise KeyError if no default and key does not exist
return super(LowerDict, self).pop(_ci_str(k))
return super(LowerDict, self).pop(_ci_str(k), v)
def update(self, mapping=(), **kwargs):
super(LowerDict, self).update(self._process_args(mapping, **kwargs))
def __contains__(self, k):
return super(LowerDict, self).__contains__(_ci_str(k))
@classmethod
def fromkeys(cls, keys, v=None):
return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v)
def __repr__(self):
return '{0}({1})'.format(type(self).__name__,
super(LowerDict, self).__repr__())
Implicit vs explicit is still a problem, but once dust settles, renaming of attributes/variables to start with ci (and a big fat doc comment explaining that ci stands for case insensitive) I think is a perfect solution - as readers of the code must be fully aware that we are dealing with case insensitive underlying data structures. This will hopefully fix some hard to reproduce bugs, which I suspect boil down to case sensitivity.
Comments/corrections welcome :)
All you will have to do is
class BatchCollection(dict):
def __init__(self, *args, **kwargs):
dict.__init__(*args, **kwargs)
OR
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
A sample usage for my personal use
### EXAMPLE
class BatchCollection(dict):
def __init__(self, inpt={}):
dict.__init__(*args, **kwargs)
def __setitem__(self, key, item):
if (isinstance(key, tuple) and len(key) == 2
and isinstance(item, collections.Iterable)):
# self.__dict__[key] = item
super(BatchCollection, self).__setitem__(key, item)
else:
raise Exception(
"Valid key should be a tuple (database_name, table_name) "
"and value should be iterable")
Note: tested only in python3
After trying out both of the top two suggestions, I've settled on a shady-looking middle route for Python 2.7. Maybe 3 is saner, but for me:
class MyDict(MutableMapping):
# ... the few __methods__ that mutablemapping requires
# and then this monstrosity
@property
def __class__(self):
return dict
which I really hate, but seems to fit my needs, which are:
**my_dict
dict
, this bypasses your code. try it out.isinstance(my_dict, dict)
dict
If you need to tell yourself apart from others, personally I use something like this (though I'd recommend better names):
def __am_i_me(self):
return True
@classmethod
def __is_it_me(cls, other):
try:
return other.__am_i_me()
except Exception:
return False
As long as you only need to recognize yourself internally, this way it's harder to accidentally call __am_i_me
due to python's name-munging (this is renamed to _MyDict__am_i_me
from anything calling outside this class). Slightly more private than _method
s, both in practice and culturally.
So far I have no complaints, aside from the seriously-shady-looking __class__
override. I'd be thrilled to hear of any problems that others encounter with this though, I don't fully understand the consequences. But so far I've had no problems whatsoever, and this allowed me to migrate a lot of middling-quality code in lots of locations without needing any changes.
As evidence: https://repl.it/repls/TraumaticToughCockatoo
Basically: copy the current #2 option, add print 'method_name'
lines to every method, and then try this and watch the output:
d = LowerDict() # prints "init", or whatever your print statement said
print '------'
splatted = dict(**d) # note that there are no prints here
You'll see similar behavior for other scenarios. Say your fake-dict
is a wrapper around some other datatype, so there's no reasonable way to store the data in the backing-dict; **your_dict
will be empty, regardless of what every other method does.
This works correctly for MutableMapping
, but as soon as you inherit from dict
it becomes uncontrollable.
Edit: as an update, this has been running without a single issue for almost two years now, on several hundred thousand (eh, might be a couple million) lines of complicated, legacy-ridden python. So I'm pretty happy with it :)
Edit 2: apparently I mis-copied this or something long ago. @classmethod __class__
does not work for isinstance
checks - @property __class__
does: https://repl.it/repls/UnitedScientificSequence
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With