Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to lazy load a data structure (python)

I have some way of building a data structure (out of some file contents, say):

def loadfile(FILE):
    return # some data structure created from the contents of FILE

So I can do things like

puppies = loadfile("puppies.csv") # wait for loadfile to work
kitties = loadfile("kitties.csv") # wait some more
print len(puppies)
print puppies[32]

In the above example, I wasted a bunch of time actually reading kitties.csv and creating a data structure that I never used. I'd like to avoid that waste without constantly checking if not kitties whenever I want to do something. I'd like to be able to do

puppies = lazyload("puppies.csv") # instant
kitties = lazyload("kitties.csv") # instant
print len(puppies)                # wait for loadfile
print puppies[32]

So if I don't ever try to do anything with kitties, loadfile("kitties.csv") never gets called.

Is there some standard way to do this?

After playing around with it for a bit, I produced the following solution, which appears to work correctly and is quite brief. Are there some alternatives? Are there drawbacks to using this approach that I should keep in mind?

class lazyload:
    def __init__(self,FILE):
        self.FILE = FILE
        self.F = None
    def __getattr__(self,name):
        if not self.F: 
            print "loading %s" % self.FILE
            self.F = loadfile(self.FILE)
        return object.__getattribute__(self.F, name)

What might be even better is if something like this worked:

class lazyload:
    def __init__(self,FILE):
        self.FILE = FILE
    def __getattr__(self,name):
        self = loadfile(self.FILE) # this never gets called again
                                   # since self is no longer a
                                   # lazyload instance
        return object.__getattribute__(self, name)

But this doesn't work because self is local. It actually ends up calling loadfile every time you do anything.

like image 676
Anton Geraschenko Avatar asked Dec 29 '10 23:12

Anton Geraschenko


1 Answers

The csv module in the Python stdlibrary will not load the data until you start iterating over it, so it is in fact lazy.

Edit: If you need to read through the whole file to build the datastructure, having a complex Lazy load object that proxies things is overkill. Just do this:

class Lazywrapper(object):
    def __init__(self, filename):
        self.filename = filename
        self._data = None

    def get_data(self):
        if self._data = None:
            self._build_data()
        return self._data

    def _build_data(self):
        # Now open and iterate over the file to build a datastructure, and
        # put that datastructure as self._data

With the above class you can do this:

puppies = Lazywrapper("puppies.csv") # Instant
kitties = Lazywrapper("kitties.csv") # Instant

print len(puppies.getdata()) # Wait
print puppies.getdata()[32] # instant

Also

allkitties = kitties.get_data() # wait
print len(allkitties)
print kitties[32]

If you have a lot of data, and you don't really need to load all the data you could also implement something like class that will read the file until it finds the doggie called "Froufrou" and then stop, but at that point it's likely better to stick the data in a database once and for all and access it from there.

like image 150
Lennart Regebro Avatar answered Oct 05 '22 08:10

Lennart Regebro