Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Whose responsibility is it to cache / memoize function results?

I'm working on software which allows the user to extend a system by implementing a set of interfaces.

In order to test the viability of what we're doing, my company "eats its own dog food" by implementing all of our business logic in these classes in the exact same way a user would.

We have some utility classes / methods that tie everything together and use the logic defined in the extendable classes.


I want to cache the results of the user-defined functions. Where should I do this?

  • Is it the classes themselves? This seems like it can lead to a lot of code duplication.

  • Is it the utilities/engine which uses these classes? If so, an uninformed user may call the class function directly and not receive any caching benefit.


Example code

public interface ILetter { string[] GetAnimalsThatStartWithMe(); }

public class A : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Aardvark", "Ant" }; 
                           }
                         }
public class B : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Baboon", "Banshee" }; 
                           } 
                         }
/* ...Left to user to define... */
public class Z : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Zebra" };
                           }
                         }

public static class LetterUtility
{
    public static string[] GetAnimalsThatStartWithLetter(char letter)
    {
        if(letter == 'A') return (new A()).GetAnimalsThatStartWithMe();
        if(letter == 'B') return (new B()).GetAnimalsThatStartWithMe();
        /* ... */
        if(letter == 'Z') return (new Z()).GetAnimalsThatStartWithMe();
        throw new ApplicationException("Letter " + letter + " not found");
    }
}

Should LetterUtility be responsible for caching? Should each individual instance of ILetter? Is there something else entirely that can be done?

I'm trying to keep this example short, so these example functions don't need caching. But consider I add this class that makes (new C()).GetAnimalsThatStartWithMe() take 10 seconds every time it's run:

public class C : ILetter
{
    public string[] GetAnimalsThatStartWithMe()
    {
        Thread.Sleep(10000);
        return new [] { "Cat", "Capybara", "Clam" };
    }
}

I find myself battling between making our software as fast as possible and maintaining less code (in this example: caching the result in LetterUtility) and doing the exact same work over and over (in this example: waiting 10 seconds every time C is used).

like image 651
Michael Avatar asked Nov 30 '11 15:11

Michael


People also ask

What does it mean to memoize a function?

In programming, memoization is an optimization technique that makes applications more efficient and hence faster. It does this by storing computation results in cache, and retrieving that same information from the cache the next time it's needed instead of computing it again.

Does Python cache function result?

Memoization allows you to optimize a Python function by caching its output based on the parameters you supply to it. Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.

How does Javascript memoize work?

Importance of Memoization: When a function is given in input, it performs the necessary computation and saves the result in a cache before returning the value. If the same input is received again in the future, it will not be necessary to repeat the process. It would simply return the cached answer from the memory.

Is caching Memoized?

Memoization is a specific form of caching that involves caching the return value of a function based on its parameters. Caching is a more general term; for example, HTTP caching is caching but not memoization.


2 Answers

Which layer is best responsible for caching of the results of these user-definable functions?

The answer is pretty obvious: the layer that can correctly implement the desired cache policy is the right layer.

A correct cache policy needs to have two characteristics:

  • It must never serve up stale data; it must know whether the method being cached is going to produce a different result, and invalidate the cache at some point before the caller would get stale data

  • It must manage cached resources efficiently on the user's behalf. A cache without an expiration policy that grows without bounds has another name: we usually call them "memory leaks".

What's the layer in your system that knows the answers to the questions "is the cache stale?" and "is the cache too big?" That's the layer that should implement the cache.

like image 155
Eric Lippert Avatar answered Oct 22 '22 08:10

Eric Lippert


Something like caching can be considered a "cross-cutting" concern (http://en.wikipedia.org/wiki/Cross-cutting_concern):

In computer science, cross-cutting concerns are aspects of a program which affect other concerns. These concerns often cannot be cleanly decomposed from the rest of the system in both the design and implementation, and can result in either scattering (code duplication), tangling (significant dependencies between systems), or both. For instance, if writing an application for handling medical records, the bookkeeping and indexing of such records is a core concern, while logging a history of changes to the record database or user database, or an authentication system, would be cross-cutting concerns since they touch more parts of the program.

Cross cutting concerns can often be implemented via Aspect Oriented Programming (http://en.wikipedia.org/wiki/Aspect-oriented_programming).

In computing, aspect-oriented programming (AOP) is a programming paradigm which aims to increase modularity by allowing the separation of cross-cutting concerns. AOP forms a basis for aspect-oriented software development.

There are many tools in .NET to facilitate Aspect Oriented Programming. I'm most fond of those that provide completely transparent implementation. In the example of caching:

public class Foo
{
    [Cache(10)] // cache for 10 minutes
    public virtual void Bar() { ... }
}

That's all you need to do...everything else happens automatically by defining a behavior like so:

public class CachingBehavior
{
   public void Intercept(IInvocation invocation) { ... } 
   // this method intercepts any method invocations on methods attributed with the [Cache] attribute. 
  // In the case of caching, this method would check if some cache store contains the data, and if it does return it...else perform the normal method operation and store the result
}

There are two general schools for how this happens:

  1. Post build IL weaving. Tools like PostSharp, Microsoft CCI, and Mono Cecil can be configured to automatically rewrite these attributed methods to automatically delegate to your behaviors.

  2. Runtime proxies. Tools like Castle DynamicProxy and Microsoft Unity can automatically generate proxy types (a type derived from Foo that overrides Bar in the example above) that delegates to your behavior.

like image 30
Jeff Avatar answered Oct 22 '22 08:10

Jeff