Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In python, how does the following AutoVivification class work?

In searching for a way of working with nested dictionaries, I found the following code posted by nosklo, which I would like to have explained, please.

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

I'm a pretty newbie programmer. I have learned most of what I know on my own time on the side, with my only formal training being on Turbo Pascal back in high school. I understand and am able to use classes in simple ways, such as using __init__, class methods, and storing data within instances of the class with foo.man = 'choo'.

I have no idea how the series of square brackets get directed, correctly, through the class (I presume they are calling __getitem__ somehow) and don't understand how they each get handled so concisely without having to call the method three times individually.

I was under the impression that the (dict) in the class declaration would be handled by an __init__.

I've used try: except: before, though again, in quite simple ways. It looks to me like the try, when it runs, is calling a series of function __getitem__. I gather that if the current level's dictionary exists, the try will pass and go to the next dictionary. The except, I gather, runs when there's a KeyError but I haven't seen self used like that before. Self's being treated like a dictionary while I thought self was an instance of class AutoVivification... is it both? I have never assigned twice in a row like this foo = man = choo but suspect that value is pointing to self[item] while self[item] points to the result of type(self). But type(self) would return something like this: <class '__main__.AutoVivification'> wouldn't it? I have no idea what the extra round brackets at the end there are for. Because I don't know how the function is being called, I don't understand where value is being returned.

Sorry for all the questions! There is so much in this that I don't understand and I don't know where to look it up short of reading through the documentation for hours in which I'd retain very little. This code looks like it'll serve my purposes but I want to understand it before using it.

In case you want to know what I'm trying to do in my program with nested dictionaries: I'm trying to hold map data on an astronomical scale. While I can't create dictionaries/lists of 10^6 items nested 4 times (that would be 10^24 items!), the space is mostly empty so I can leave the empty values out completely and only assign when there's something there. What was stumping me was an efficient way of handling the dictionaries.

like image 576
Khono Avatar asked Nov 07 '12 18:11

Khono


1 Answers

Line by line:

class AutoVivification(dict):

We make a subclass of dict, so AutoVivification is a kind of dict, with some local changes.

def __getitem__(self, item):

The __getitem()__ hook is called whenever someone tries to access an item on the instance through [...] index lookups. So whenever someone does object[somekey], type(object).__getitem__(object, somekey) is called.

We'll skip the try for a moment, next line is:

 return dict.__getitem__(self, item)

This calls the unbound method __getitem__(), and passes in our own instance to it, together with the key. In other words, we call the original __getitem__ as defined by our parent class dict.

Now, we all know what happens if there is no item key in a dictionary, a KeyError is raised. This is where the try:, except KeyError combo comes in:

    try:
        return dict.__getitem__(self, item)
    except KeyError:
        value = self[item] = type(self)()
        return value

So, if the current instance (which is a sub-type of dict) doesn't have a given key, it'll catch the KeyError exception the original dict.__getitem__() method throws, and instead we create a new value, store that in self[item] and return that value.

Now, remember that self is a (subclass) of dict, so it's a dictionary. It thus can assign new values (for which it'll use the __setitem__ hook, incidentially), and in this case it creates a new instance of the same type as self. That's another dict subclass.

So what happens in detail when we call a[1][2][3] = 4? Python goes through this step by step:

  1. a[1] leads to type(a).__getitem__(a, 1). The custom __getitem__ method of AutoVivification catches the KeyError, creates a new instance of AutoVivification, stores that under the key 1 and returns it.

  2. a[1] returned an empty AutoVivification instance. The next item access [2] is called on that object, and we repeat what happened in step 1; there is a KeyError, a new instance of AutoVivification is created, stored under the 2 key, and that new instance is returned to the caller.

  3. a[1][2] returned an empty AutoVivification instance. The next item access [3] is called on that object, and we repeat what happened in step 1 (and in step 2). There is a KeyError, a new instance of AutoVivification is created, stored under the 3 key, and that new instance is returned to the caller.

  4. a[1][2][3] returned an empty AutoVivification instance. Now we store a new value in that instance, 4.

Once you go to your next line of code, a[1][3][3] = 5, the top-level AutoVivification instance already has a 1 key, and the return dict.__getitem__(self, item) line will return the corresponding value, which happens to be the AutoVivification instance created in step one above.

From there, the [3] item access call will create a new AutoVivification instance again (because the object at a[1] only has a 2 key), and we go through all the same steps again.

like image 56
Martijn Pieters Avatar answered Oct 15 '22 01:10

Martijn Pieters