Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching Strategy for queried data

I'm currently in the process of building a repository for a project that will be DB intensive (Performance tests have been carried out and caching is needed hence why I'm asking )

The way I've got it set up now is that each object is individually cached, if I want to do a query for them objects I pass the query to the database and return a the id's required. (For some simple queries I've cached and manage the ids)

I then hit the cache with these ids and pull them out, any missing objects are bundle in to "where in" statement and fired to the database; at this point I repopulate the cache with the missing ids.

The queries them selves are most likely to be about paging / ordering the data.

Is this a suitable strategy? Or perhaps are there better techniques available?

like image 457
TWith2Sugars Avatar asked Jun 03 '09 14:06

TWith2Sugars


2 Answers

This is a reasonable approach and I have gone this route before and it's best to use this for simple caching.

However, when you are updating or writing to the database you will run into some interesting problems and you should handle these scenarios carefully.

For example your cache data will become obsolete if the user updates the record in the database. In that scenario you will either need to simultaneously update the in-memory cache or purge the cache so that it can be refreshed on the next fetch query.

Things can also get tricky if you for example the user updates a customer's email address which is in a separate table but associated via a foreign key.

Besides database caching you should also be considering output caching. This works quite well if for example you have a table that shows sales data for previous month. The table could be stored in another file that gets included in a bunch of other pages that want to show the table. Now if you cache the file with the sales data table, those other pages when they request this file, the caching engine can fetch it straight from the disk and the business logic layer doesn't even get hit. This is not applicable all the time but quite useful for custom controls.

Unit of Work Pattern

It also helps to know about the Unit of Work pattern.

When you're pulling data in and out of a database, it's important to keep track of what you've changed; otherwise, that data won't be written back into the database. Similarly you have to insert new objects you create and remove any objects you delete.

You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

like image 125
aleemb Avatar answered Sep 27 '22 22:09

aleemb


If you are using SQLServer, you can use SqlCacheDependency where your cache will be automatically repopulated when the data table changes in the database. Here's the link for SqlCacheDependency

This link contains a similar cache dependency solution. (It's for a file rather than a DB. You will need to make some changes as per the msdn link above to have a cache dependency on DB)

Hope this helps :)

like image 32
SO User Avatar answered Sep 27 '22 23:09

SO User