Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does the CPU benefit of having an Hibernate 2nd level cache outweigh the initial hit

Tags:

When does the CPU benefit of having an object added to Hibernate 2nd level object cache outweigh the initial hit.

I am currently using Hibernate without 2nd level cache. This is for an application that processes music files (www.jthink.net/songkong) and it uses Hibernate so it can scale with more data, i.e it can process 100,000 songs with little more memory than 1000 songs. Once the songs have been processed then those songs are of no interest (unless the user runs Undo)

As I understand it if I enable 2nd level cache (for my song class) then the initial write of the song to cache will use more cpu then if just writing to database, and additional modifications to the song object will also require more cpu resource. But subsequent retrieval of the song from Ehcache will require less resource then retrieving it from database.

My songs are processed folder by folder and go through a number of stages (on different Executors), when they are queued on the next Executor we just pass the song ids as parameters otherwise, it would use a lot of heap memory storing the Song objects themselves. So when a particular task is actually run on an Executor the first thing it does is retrieve the songs for those ids.

So there are no particular song ids that are retrieved 1000s of times, but every song is typically written to between 1 and 4 times and retrieves 10 times. So if we had a quite small cache (because I want to keep heap memory under close control) I would expect the first few folders to be processed to have their songs added to the cache, then as they complete songs from new folders would take their place in the cache.

But my question is, is it worth it?

As a rule of thumb does 10 retrievals versus 1-4 writes makes sense of using 2nd level cache, or is only useful if the ratio is more like 100:1?

like image 864
Paul Taylor Avatar asked Oct 16 '18 09:10

Paul Taylor


1 Answers

The real answer is: Just benchmark it.

Writing to heap cache isn't that costly. So yes, even retrieving once from the cache will make it faster then boing back to the database.

Then, a cache does mostly two things on top of a HashMap. It evicts and expires.

Eviction means that you set some maximum size to the cache. When this is reached, the cache will evict the "oldest" entry to add a new one. There are multiple definitions for oldest. Ehcache does a sampling over a set of entries and kicks out the entry that wasn't accessed for the longest time in the sample.

Expiration means that a given entry will be considered stale at some point. For instance, you want to keep an entry 1 hour before refreshing the entry with the latest one in the database. When you get an entry, Ehcache first looks if the entry is expired. If it is, it will return null and remove the entry from the cache. It means that an expired entry will stay in the cache until you try to access it.

In your case, you will want to load the entry once. Then have it in cache. Use it and finally remove it to save memory. If you have a final step where you know you won't need the entry anymore, just remove it there.

If you don't, you will have to rely on eviction. Because the eviction algorithm will remove expired entries first (why removing a perfectly valid entry if you can remove expired one?).

You should calculate how many time an entry should stay in cache to go through all the Executors. This will be your expiry time (TTL). Then you size your cache more or less to NB_EXECUTORS * NB_STEPS. It will then be the size of the currently in used songs. When adding a new song, the cache will need to evict an old entry. In most cases, this entry will be expired so no harm done.

To prevent eviction (which can be costly when not finding an expired entry), you can code a background routine that gets entries. It will trigger expiration. But again, don't do that before being sure, using a benchmark, that it is actually faster.

Finally, you might want to cache a song directly instead of using Hibernate level 2. Because it will require less operation to get the song. Also, when writing an entry that was in second-level cache, Hibernate tend to evict from the cache. Make sure you configure it to NOT do that.

A note about modification. By default Ehcache on-heap cache (and only on-heap cache) is per reference. So if you retrieve a Song object from the cache and then modify it, the entry in cache is modified as well since it's actually the one and only instance.

However, that's not how Hibernate second level cache works. They will keep in cache some kind of database row. This will be converted to the Song and returned to you.

When you save the Song to database, Hibernate will evict it from the cache as I was saying above (but you might ask for a cache update in the configuration, I'm not sure about that).

That's why I think you should cache directly instead of using second-level cache. However, watch out because you get an object loaded by Hibernate. You need to detach it from Hibernate before putting it in cache. And then attach it in the new executor. Otherwise, if you have collections for instance, strange things can happen.

Now, assuming you want to update the cache and database every time. You have two ways to do it.

With Cache-aside, you will update the DB then update the cache.

With Cache-through, you will update the cache which will take care (atomitacally) of updating the DB. Cache-through is a little more involved since you need to provide a CacheLoaderWriter implementation. But it makes sure the cache and database are always in sync.

like image 192
Henri Avatar answered Oct 11 '22 09:10

Henri