Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I memoize a class instantiation in Python?

Tags:

Ok, here is the real world scenario: I'm writing an application, and I have a class that represents a certain type of files (in my case this is photographs but that detail is irrelevant to the problem). Each instance of the Photograph class should be unique to the photo's filename.

The problem is, when a user tells my application to load a file, I need to be able to identify when files are already loaded, and use the existing instance for that filename, rather than create duplicate instances on the same filename.

To me this seems like a good situation to use memoization, and there's a lot of examples of that out there, but in this case I'm not just memoizing an ordinary function, I need to be memoizing __init__(). This poses a problem, because by the time __init__() gets called it's already too late as there's a new instance created already.

In my research I found Python's __new__() method, and I was actually able to write a working trivial example, but it fell apart when I tried to use it on my real-world objects, and I'm not sure why (the only thing I can think of is that my real world objects were subclasses of other objects that I can't really control, and so there were some incompatibilities with this approach). This is what I had:

class Flub(object):     instances = {}      def __new__(cls, flubid):         try:             self = Flub.instances[flubid]         except KeyError:             self = Flub.instances[flubid] = super(Flub, cls).__new__(cls)             print 'making a new one!'             self.flubid = flubid         print id(self)         return self      @staticmethod     def destroy_all():         for flub in Flub.instances.values():             print 'killing', flub   a = Flub('foo') b = Flub('foo') c = Flub('bar')  print a print b print c print a is b, b is c  Flub.destroy_all() 

Which output this:

making a new one! 139958663753808 139958663753808 making a new one! 139958663753872 <__main__.Flub object at 0x7f4aaa6fb050> <__main__.Flub object at 0x7f4aaa6fb050> <__main__.Flub object at 0x7f4aaa6fb090> True False killing <__main__.Flub object at 0x7f4aaa6fb050> killing <__main__.Flub object at 0x7f4aaa6fb090> 

It's perfect! Only two instances were made for the two unique id's given, and Flub.instances clearly only has two listed.

But when I tried to take this approach with the objects I was using, I got all kinds of nonsensical errors about how __init__() took only 0 arguments, not 2. So I'd change some things around and then it would tell me that __init__() needed an argument. Totally bizarre.

After a while of fighting with it, I basically just gave up and moved all the __new__() black magic into a staticmethod called get, such that I could call Photograph.get(filename) and it would only call Photograph(filename) if filename wasn't already in Photograph.instances.

Does anybody know where I went wrong here? Is there some better way to do this?

Another way of thinking about it is that it's similar to a singleton, except it's not globally singleton, just singleton-per-filename.

Here's my real-world code using the staticmethod get if you want to see it all together.

like image 905
robru Avatar asked Jun 04 '12 09:06

robru


People also ask

How are classes instantiated in Python?

Instantiating a class in Python is simple. To instantiate a class, we simply call the class as if it were a function, passing the arguments that the __init__ method defines. The return value will be the newly created object.

What is a Memoized function?

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.


2 Answers

Let us see two points about your question.

Using memoize

You can use memoization, but you should decorate the class, not the __init__ method. Suppose we have this memoizator:

def get_id_tuple(f, args, kwargs, mark=object()):     """      Some quick'n'dirty way to generate a unique key for an specific call.     """     l = [id(f)]     for arg in args:         l.append(id(arg))     l.append(id(mark))     for k, v in kwargs:         l.append(k)         l.append(id(v))     return tuple(l)  _memoized = {} def memoize(f):     """      Some basic memoizer     """     def memoized(*args, **kwargs):         key = get_id_tuple(f, args, kwargs)         if key not in _memoized:             _memoized[key] = f(*args, **kwargs)         return _memoized[key]     return memoized 

Now you just need to decorate the class:

@memoize class Test(object):     def __init__(self, somevalue):         self.somevalue = somevalue 

Let us see a test?

tests = [Test(1), Test(2), Test(3), Test(2), Test(4)] for test in tests:     print test.somevalue, id(test) 

The output is below. Note that the same parameters yield the same id of the returned object:

1 3072319660 2 3072319692 3 3072319724 2 3072319692 4 3072319756 

Anyway, I would prefer to create a function to generate the objects and memoize it. Seems cleaner to me, but it may be some irrelevant pet peeve:

class Test(object):     def __init__(self, somevalue):         self.somevalue = somevalue  @memoize def get_test_from_value(somevalue):     return Test(somevalue) 

Using __new__:

Or, of course, you can override __new__. Some days ago I posted an answer about the ins, outs and best practices of overriding __new__ that can be helpful. Basically, it says to always pass *args, **kwargs to your __new__ method.

I, for one, would prefer to memoize a function which creates the objects, or even write a specific function which would take care of never recreating a object to the same parameter. Of course, however, this is mostly a opinion of mine, not a rule.

like image 92
brandizzi Avatar answered Sep 19 '22 20:09

brandizzi


The solution that I ended up using is this:

class memoize(object):     def __init__(self, cls):         self.cls = cls         self.__dict__.update(cls.__dict__)          # This bit allows staticmethods to work as you would expect.         for attr, val in cls.__dict__.items():             if type(val) is staticmethod:                 self.__dict__[attr] = val.__func__      def __call__(self, *args):         key = '//'.join(map(str, args))         if key not in self.cls.instances:             self.cls.instances[key] = self.cls(*args)         return self.cls.instances[key] 

And then you decorate the class with this, not __init__. Although brandizzi provided me with that key piece of information, his example decorator didn't function as desired.

I found this concept quite subtle, but basically when you're using decorators in Python, you need to understand that the thing that gets decorated (whether it's a method or a class) is actually replaced by the decorator itself. So for example when I'd try to access Photograph.instances or Camera.generate_id() (a staticmethod), I couldn't actually access them because Photograph doesn't actually refer to the original Photograph class, it refers to the memoized function (from brandizzi's example).

To get around this, I had to create a decorator class that actually took all the attributes and static methods from the decorated class and exposed them as it's own. Almost like a subclass, except that the decorator class doesn't know ahead of time what classes it will be decorating, so it has to copy the attributes over after the fact.

The end result is that any instance of the memoize class becomes an almost transparent wrapper around the actual class that it has decorated, with the exception that attempting to instantiate it (but really calling it) will provide you with cached copies when they're available.

like image 23
robru Avatar answered Sep 21 '22 20:09

robru