Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Model-level caching

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?

I would keep my caching responsibilities firmly within the model. It is none of the controller's or view's business where the model is getting data. All they care about is that when data is requested, data is provided - this is how the MVC paradigm is supposed to work.

(Source: Post by Jarrod)

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

Another poster said this:

I would suggest using the md5 hash of your query combined with a serialized version of your input arguments.

Is the minuscule chance of collision worth worrying about?

Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.


Update for Bounty

I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2 problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.

Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?

I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.

like image 467
Lotus Notes Avatar asked May 26 '10 20:05

Lotus Notes


People also ask

What is model cache?

A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution.

How is cache implemented?

Implementation: So the standard way to implement cache is to have a data structure, using which we can access value by a given key in constant time. Now all good, we can save key value pairs in memory and retrieve it whenever we need it.

When should caching be implemented?

Caches are generally used to keep track of frequent responses to user requests. It can also be used in the case of storing results of long computational operations. Caching is storing data in a location different than the main data source such that it's faster to access the data.

What are the two types of caching?

L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in the processor chip as CPU cache. L2 cache, or secondary cache, is often more capacious than L1.


1 Answers

Your post only makes any sense if the model is a trivial ORM. And there are lots of reasons why that's a bad thing. Try thinking about the model as if it were a web service.

Caching is the responsiblity of the model.

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

But the inputs to the model uniquely define its output.

If you're using the same model to retrieve the contents of a shopping basket and to run a search on your product catalog then there's something wrong with your code.

Even in the case of the shopping basket, there may be merit in caching data with a TTL of less than the time taken to process a transaction which would change its contents, in the case of the catalog search, caching the list of matching products for a few hours will probably have no measurable impact on sales, but trade-off well in reducing database load.

The fact that you are using a trivial ORM out of the box does not exclude you from wrapping it in your own code.

Wouldn't the Model have to be astronomically smart, and/or store statistics

No. You make the determination on whether to cache, and if you can't ensure that the cache is consistent then enforce a TTL based on the type of request.

As a general rule of thumb, you should be able to predict appropriate TTLs based on the SELECT query before binding any variables and this needs to be implemented at design time - but obviously the results should be indexed based on the query after binding.

Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

For preference I would implement this as a decorator on the model class - that way you can easily port it to models which implement a factory rather than trivial ORM.

C.

like image 50
symcbean Avatar answered Oct 08 '22 09:10

symcbean