Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient memoization in Python

I have some task to solve and the most important part at the moment is to make the script as time-efficient as possible. One of the elements I am trying to optimize is memoization within one of the functions.

So my question is: Which of the following 3-4 methods is the most efficient / fastest method of implementing memoization in Python?

I have provided code only as an example - if one of the methods is more efficient, but not in the case I mentioned, please share what you know.

Solution 1 - using mutable variable from outer scope

This solution is often shown as the example memoization, but I am not sure how efficient it is. I have heard that using global variables (in this case it is variable from outer, not global scope) is less efficient.

def main():
    memo = {}
    def power_div(n):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Solution 2 - using default, mutable argument

I have found somewhere that using default mutable arguments has been used in the past to pass variables from outer scope, when Python searched the variable first in the local scope, then in the global scope, skipping the nonlocal scope (in this case the scope within function main()). Because default argument is initialized only at the time function is defined and is accessible only inside the inner function, maybe it is thus more efficient?

def main():
    def power_div(n, memo={}):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Or maybe the following version (being in fact a combination of solutions 1&2) is more efficient?

def main():
    memo = {}
    def power_div(n, memo=memo):
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Solution 3 - function's attribute

This is another quite common example of memoization in Python - the memoization object is stored as an attribute of the function itself.

def main():
    def power_div(n):
        memo = power_div.memo
        try:
            return memo[n]
        except (KeyError):
            memo[n] = (n ** 2) % 4  # example expression, should not matter
            return memo[n]
    # extensive usage of power_div() here

Summary

I am very interested in your opinions about the four above solutions for memoization. It is important also, that the function that uses memoization is within another function.

I know that there are also other solutions for memoization (such as Memoize decorator), but it is hard for me to believe that this is more efficient solution than these listed above. Correct me if I am wrong.

Thanks in advance.

like image 902
Tadeck Avatar asked Feb 02 '12 06:02

Tadeck


People also ask

How would you achieve Memoization in Python?

Memoization is a technique of recording the intermediate results so that it can be used to avoid repeated calculations and speed up the programs. It can be used to optimize the programs that use recursion. In Python, memoization can be done with the help of function decorators.

What is the downside of using Memoization?

slowdown in initial execution. space overhead. extra burdens on programmers, because it may require programmers to modify code.

Is Memoization the same as caching?

Is memoization same as caching? Yes, kind of. Memoization is actually a specific type of caching. While caching can refer in general to any storing technique (like HTTP caching) for future use, memoizing specifically involves caching the return values of a function .

What are the benefits of Memoization?

It helps avoid waste by removing the need to recalculate values that have already been produced as part of a previous call. The benefits of memoization will be less apparent in functions that are simple to begin with or infrequently called.


2 Answers

The different styles of variable access have already been timed and compared at: http://code.activestate.com/recipes/577834-compare-speeds-of-different-kinds-of-access-to-var Here's a quick summary: local access beats nonlocal (nested scopes) which beat global access (module scope) which beats access to builtins.

Your solution #2 (with local access) should win. Solution #3 has a slow-dotted lookup (which requires a dictionary lookup). Solution #1 uses nonlocal (nested scope) access which uses cell-variables (faster than a dict lookup but slower than locals).

Also note, the KeyError exception class is a global lookup and can be sped-up by localizing it. You could replace the try/except entirely and use a memo.get(n, sentinel) instead. And even that could be sped-up by using a bound method. Of course, your easiest speed boost may just come from trying out pypy :-)

In short, there are many ways to tweak this code. Just make sure it's worth it.

like image 149
Raymond Hettinger Avatar answered Oct 26 '22 03:10

Raymond Hettinger


For the benefit of people who stumble on this question while looking for a way to do memoization in python, I recommend fastcache.

It works on python 2 and 3, is faster than any of the methods described above, and gives the option to limit cache size so that it does not inadvertently get too big:

from fastcache import clru_cache

@clru_cache(maxsize=128, typed=False)
def foo(cat_1, cat_2, cat_3):
    return cat_1 + cat_2 + cat_3

Installing fastcache is simple, using pip:

pip install fastcache

or conda:

conda install fastcache
like image 31
ostrokach Avatar answered Oct 26 '22 01:10

ostrokach