Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google App Engine - Caching generated HTML

Tags:

I have written a Google App Engine application that programatically generates a bunch of HTML code that is really the same output for each user who logs into my system, and I know that this is going to be in-efficient when the code goes into production. So, I am trying to figure out the best way to cache the generated pages.

The most probable option is to generate the pages and write them into the database, and then check the time of the database put operation for a given page against the time that the code was last updated. Then, if the code is newer than the last put to the database (for a particular HTML request), new HTML will be generated and served, and cached to the database. If the code is older than the last put to the database, then I will just get the HTML direct from the database and serve it (therefore avoiding all the CPU wastage of generating the HTML). I am not only looking to minimize load times, but to minimize CPU usage.

However, one issue that I am having is that I can't figure out how to programatically check when the version of code uploaded to the app engine was updated.

I am open to any suggestions on this approach, or other approaches for caching generated html.

Note that while memcache could help in this situation, I believe that it is not the final solution since I really only need to re-generate html when the code is updated (as opposed to every time the memcache expires).

like image 536
Alexander Marquardt Avatar asked Dec 18 '09 22:12

Alexander Marquardt


People also ask

How long does it take my application to handle a request?

The average length of time it takes to hear back is one to two weeks or around 10-14 days after you submit your application materials. In contrast, certain jobs, like those for government positions could take as long as six to eight weeks to hear back.

What is memcache in Google?

Shared memcache is the free default for App Engine applications. It provides cache capacity on a best-effort basis and is subject to the overall demand of all the App Engine applications using the shared memcache service. Dedicated memcache provides a fixed cache capacity assigned exclusively to your application.


1 Answers

In order of speed:

  1. memcache
  2. cached HTML in data store
  3. full page generation

Your caching solution should take this into account. Essentially, I would probably recommend using memcache anyways. It will be faster than accessing the data store in most cases and when you're generating a large block of HTML, one of the main benefits of caching is that you potentially didn't have to incur the I/O penalty of accessing the data store. If you cache using the data store, you still have the I/O penalty. The difference between regenerating everything and pulling from cached html in the data store is likely to be fairly small unless you have a very complex page. It's probably better to get a bunch of very fast cache hits off memcache and do a full regenerate every once in a while than to make a call out to the data store every time. There's nothing stopping you from invalidating the cached HTML in memcache when you update, and if your traffic is high enough to warrant it, you can always do a multi-level caching system.

However, my main concern is that this is premature optimization. If you don't have the traffic yet, keep caching to a minimum. App Engine provides a set of really convenient performance analysis tools, and you should be using those to identify bottlenecks after you've got at least a few QPS of traffic.

Anytime you're doing performance optimization, measure first! A lot of performance "optimizations" turn out to either be slower than the original, exactly the same, or they have negative user experience characteristics (like stale data). Don't optimize until you're certain you have to.

like image 139
Bob Aman Avatar answered Oct 25 '22 15:10

Bob Aman