Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching data within a Python class (to avoid expensive filesystem reads on App Engine)

This question isn't entirely App Engine specific, but it might help knowing the context: I have a kind of "static site generator" on App Engine that renders pages and allows them to be styled via various themes and theme settings. The themes are currently stored directly on the App Engine filesystem and uploaded with the application. A theme consists of a few templates and yaml configuration data.

To encapsulate working with themes, I have a Theme class. theme = Theme('sunshine'), for example, constructs a Theme instance that loads and parses the configuration data of the theme called 'sunshine', and allows calls like theme.render_template('index.html') that automatically load and render the correct file on the filesystem.

Problem is, loading and especially parsing a Theme's (yaml) configuration data every time a new request comes in and instantiates a Theme is expensive. So, I want to cache the data within the processes/App Engine instances and maybe later within memcached.

Until now, I've used very simple caches like so:

class Theme(object):
     _theme_variables_cache = {}

     def __init__(self, name):
         self.name = name

         if name not in Theme._theme_variables_cache:
             Theme._theme_variables[name] = self.load_theme_variables()

...

(I'm aware that the config could be read multiple times when several requests hit the constructor at the same time. I don't think it causes problems though.)

But that kind of caching gets ugly really quickly. I have several different things I want to read from config files and all of the caches are dictionaries because every different theme 'name' also points to a different underlying configuration.

The last idea I had was creating a function like Theme._cached_func(func) that will only execute func when the functions result isn't already cached for the specific template (remember, when the object represents a different template, the cached value can also be different). So I could use it like: self.theme_variables = Theme._cached_func(self.load_theme_variables()), but, I have a feeling I'm missing something obvious here as I'm still pretty new to Python.

Is there an obvious and clean Python caching pattern that will work for such a situation without cluttering up the entire class with cache logic? I think I can't just memoize function results via decorators or something because different templates will have to have different caches. I don't even need any "stale" cache handling because the underlying configuration data doesn't change while a process runs.

Update

I ended up doing it like that:

class ThemeConfig(object):
    __instances_cache = {}

    @classmethod
    def get_for(cls, theme_name):
        return cls.__instances_cache.setdefault(
            theme_name, ThemeConfig(theme_name))

    def __init__(self, theme_name):
        self.theme_name = theme_name
        self._load_assets_urls()  # those calls load yaml files
        self._load_variables()
...


class Theme(object):
    def __init__(self, theme_name):
        self.theme_name = theme_name
        self.config = ThemeConfig.get_for(theme_name)
...

So ThemeConfig stores all the configuration stuff that's read from the filesystem for a theme and the factory method ThemeConfig.get_for will always hand out the same ThemeConfig instance for the same theme name. The only caching logic I have is the one line in the factory method, and Theme objects are still as temporary and non-shared as they always were, so I can use and abuse them however I wish.

like image 481
Carst3n Avatar asked Oct 21 '22 03:10

Carst3n


1 Answers

I will take a shot at this. Basically, a factory pattern can be used here to maintain a clean boundary between your Theme object and the creation of the Theme instance with a particular way.

The factory itself can also maintain a simple caching strategy by storing a mapping between the Theme name and the corresponding Theme object. I would go with a following implementation:

#the ThemeFactory class instantiates a Theme with a particular name if not present within it's cache
class ThemeFactory(object) :

     def __init__(self):
         self.__theme_variables_cache = {}

     def createTheme(self, theme_name):
         if not self.__theme_variables_cache.contains(name):
              theme = Theme(theme_name)
              self.__theme_variables_cache[name] = theme.load_theme_variables()
          return self.__theme_variables_cache[name]

The definition of the Theme class is now very clean and simple and will not contain any caching complications

class Theme(object):

    def __init__(self, name):
        self.__theme_name = name

    def load_theme_variables(self):
        #contain the logic for loading theme variables from theme files

The approach has the advantages of code maintainability and clear segregation of responsibilities ( although not completely so , the factory class still maintains the simple cache. Ideally it should simply have a reference to a caching service or another class that handles caching .. but you get the point).

Your Theme class does what it does the best - loading theme variables. Since you have a factory pattern, you are keeping the client code ( the one that consumes the Theme class instance) encapsulated from the logic of creating the Theme instances. As your application grows, you can extend this factory to control the creation of various Theme objects (including classes derived fron Theme)

Note that this is just one way of achieving simple caching behavior as well as instance creation encapsulation.

One more point - you could store Theme objects within the cache instead of the theme variables. This way you could read the theme variables from templates only on first use(lazy loading). However, in this case you would need to make sure that you store the theme variables as an instance variable of the Theme class. The method load_theme_variables(self) now needs to be written this way:

def load_theme_variables(self):
   #let the theme variables be stored in an instance variable __theme_variable
   if not self.__theme_variables is None:
       return self.__theme_variables
    #__read_theme_file is a private function that reads the theme files
   self__theme_variables = self.__read_theme_file(self.__theme_name)

Hopefully, this gives you an idea on how to go about achieving your use case.

like image 178
Prahalad Deshpande Avatar answered Oct 27 '22 11:10

Prahalad Deshpande