In searching for a way of working with nested dictionaries, I found the following code posted by nosklo, which I would like to have explained, please.
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
Testing:
a = AutoVivification()
a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6
print a
Output:
{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}
I'm a pretty newbie programmer. I have learned most of what I know on my own time on the side, with my only formal training being on Turbo Pascal back in high school. I understand and am able to use classes in simple ways, such as using __init__
, class methods, and storing data within instances of the class with foo.man = 'choo'
.
I have no idea how the series of square brackets get directed, correctly, through the class (I presume they are calling __getitem__
somehow) and don't understand how they each get handled so concisely without having to call the method three times individually.
I was under the impression that the (dict)
in the class declaration would be handled by an __init__
.
I've used try: except:
before, though again, in quite simple ways. It looks to me like the try
, when it runs, is calling a series of function __getitem__
. I gather that if the current level's dictionary exists, the try will pass and go to the next dictionary. The except
, I gather, runs when there's a KeyError
but I haven't seen self
used like that before. Self
's being treated like a dictionary while I thought self
was an instance of class AutoVivification
... is it both? I have never assigned twice in a row like this foo = man = choo
but suspect that value
is pointing to self[item]
while self[item]
points to the result of type(self)
. But type(self)
would return something like this: <class '__main__.AutoVivification'>
wouldn't it? I have no idea what the extra round brackets at the end there are for. Because I don't know how the function is being called, I don't understand where value
is being returned.
Sorry for all the questions! There is so much in this that I don't understand and I don't know where to look it up short of reading through the documentation for hours in which I'd retain very little. This code looks like it'll serve my purposes but I want to understand it before using it.
In case you want to know what I'm trying to do in my program with nested dictionaries: I'm trying to hold map data on an astronomical scale. While I can't create dictionaries/lists of 10^6 items nested 4 times (that would be 10^24 items!), the space is mostly empty so I can leave the empty values out completely and only assign when there's something there. What was stumping me was an efficient way of handling the dictionaries.
Line by line:
class AutoVivification(dict):
We make a subclass of dict
, so AutoVivification
is a kind of dict
, with some local changes.
def __getitem__(self, item):
The __getitem()__
hook is called whenever someone tries to access an item on the instance through [...]
index lookups. So whenever someone does object[somekey]
, type(object).__getitem__(object, somekey)
is called.
We'll skip the try
for a moment, next line is:
return dict.__getitem__(self, item)
This calls the unbound method __getitem__()
, and passes in our own instance to it, together with the key. In other words, we call the original __getitem__
as defined by our parent class dict
.
Now, we all know what happens if there is no item
key in a dictionary, a KeyError
is raised. This is where the try:
, except KeyError
combo comes in:
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
So, if the current instance (which is a sub-type of dict
) doesn't have a given key, it'll catch the KeyError
exception the original dict.__getitem__()
method throws, and instead we create a new value, store that in self[item]
and return that value.
Now, remember that self
is a (subclass) of dict
, so it's a dictionary. It thus can assign new values (for which it'll use the __setitem__
hook, incidentially), and in this case it creates a new instance of the same type as self
. That's another dict
subclass.
So what happens in detail when we call a[1][2][3] = 4
? Python goes through this step by step:
a[1]
leads to type(a).__getitem__(a, 1)
. The custom __getitem__
method of AutoVivification
catches the KeyError
, creates a new instance of AutoVivification
, stores that under the key 1
and returns it.
a[1]
returned an empty AutoVivification
instance. The next item access [2]
is called on that object, and we repeat what happened in step 1; there is a KeyError
, a new instance of AutoVivification
is created, stored under the 2
key, and that new instance is returned to the caller.
a[1][2]
returned an empty AutoVivification
instance. The next item access [3]
is called on that object, and we repeat what happened in step 1 (and in step 2). There is a KeyError
, a new instance of AutoVivification
is created, stored under the 3
key, and that new instance is returned to the caller.
a[1][2][3]
returned an empty AutoVivification
instance. Now we store a new value in that instance, 4
.
Once you go to your next line of code, a[1][3][3] = 5
, the top-level AutoVivification
instance already has a 1
key, and the return dict.__getitem__(self, item)
line will return the corresponding value, which happens to be the AutoVivification
instance created in step one above.
From there, the [3]
item access call will create a new AutoVivification
instance again (because the object at a[1]
only has a 2
key), and we go through all the same steps again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With