Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hibernate: batch_size? Second Level Cache?

I have a Hibernate domain object that gets loaded by different parts of the application. Sometimes it's advantageous to lazy load every association and others it's better to load the entire thing in one join. As a hopefully happy compromise I've found:

Using batch fetching, Hibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization of the lazy select fetching strategy.

hibernate.default_batch_fetch_size:

Using batch fetching, Hibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization of the lazy select fetching strategy.

I also see:

hibernate.jdbc.fetch_size:

A non-zero value determines the JDBC fetch size (calls Statement.setFetchSize()).

Well is Hibernate smart enough to look in the second-level cache when doing the batch fetching? i.e Do one fetch for the initial call to the association and then the next X calls hit the cache? That way I can have the lazy loading I desire but also hit the cache often for the more bulk-like transactions.

If entire contents of the collection is already contained in the cache, would it still execute the fetching queries on access of the collection?

Thanks.

like image 964
davidemm Avatar asked Aug 25 '09 16:08

davidemm


1 Answers

I did a lot of research today and was able to dig up a response to my own question. I was looking through the Hibernate code and the flow looks like this:

Is the collection initialized?

  • No? Do a batch fetch (items obtained by batch-fetch are placed in the cache)
  • Yes? Look in the cache for the particular item, if it's not there do a batch-fetch.

So if the item in the collection you're looking for IS FOUND in the cache then the batch-fetch doesn't happen. If the item IS NOT found in the second-level cache then the batch fetch happens, BUT it will do a fetch of batched items REGARDLESS of whether the batched items are in the cache.


----- EXAMPLE 1 -----

The Good:

(Three items in a collection - batch size of 3) The first go:

  • collection.getItem(0) - No cache | batch-fetch 3 items
  • collection.getItem(1) - Loaded in by batch-fetch
  • collection.getItem(2) - Loaded in by batch-fetch

Now, somewhere else, later in time:

  • collection.getItem(0) - Cache Hit
  • collection.getItem(1) - Cache Hit
  • collection.getItem(2) - Cache Hit

----- EXAMPLE 2 -----

The Bad:

(Three items in a collection - batch size of 3)

In this case, the item at index 0 was removed from the cache because maybe the cache was full and the item was dropped, or the item went stale or idle.

  • collection.getItem(0) - Not In Cache So Do Batch Of 3 (select * where id in (?, ?, ?))
  • collection.getItem(1) - In Cache Already (replaced by batch-fetch anyway)
  • collection.getItem(2) - In Cache Already (replaced by batch-fetch anyway)

So the trade off here is that you'll have fewer SQL calls due to batching, but you'll be missing you cache more often. There is a ticket open to have the batching look in the second-level cache before it goes out to the database.

http://opensource.atlassian.com/projects/hibernate/browse/HHH-1775

Vote it up!

like image 113
davidemm Avatar answered Nov 09 '22 14:11

davidemm