I was going through some hibernate tutorials and got stuck on default_batch_fetch_size. Reading expert comments on "Can Hibernate be used in performance sensitive applications?" clearly explained the significance but I am trying to understand why are the recommended values 4, 8, 16 or 32 as used in the link.
Regards Tarun
fetch_size - Used to specify number of rows to be fetched in a select query. hibernate. jdbc. batch_size - Used to specify number of inserts or updates to be carried out in a single database hit.
The Hibernate documentation gives some information at @BatchSize as : @BatchSize specifies a "batch size" for fetching instances of this class by identifier. Not yet loaded instances are loaded batch-size at a time (default 1).
Annotation Type BatchSizeSpecifies a batch size for batch fetching of the annotated entity or collection. For example: @Entity @BatchSize(size = 100) class Product { ... }
Answer. This option hibernate. jdbc. fetch_size aims to control the database operations in batch mode.
Summary:
When batch fetching is enabled, Hibernate prepare a lot of queries: those queries take a lot of memory which can't be garbaged. A batch size of 1000 will take like 150 Mo of RAM.
So, having a low general batch size (like 10, 20 or 40) is best, only set bigger batch size for specific collection with the @BatchSize annotations.
Detail:
Fetching batch size is explained here Understanding @BatchSize in Hibernate , "hibernate.default_batch_fetch_size" is the general parameter and the "@BatchSize" annotation allows to override the general parameter on a specific association.
But those explanations don't really answer the question "why the official doc recommends the values 4, 8 or 16"? Obviously, modern databases can handle queries with far more than 16 values in a IN clause, and doing queries with let say 1000 values in the IN clause will allow to do less queries and thus allow to have better performance... So why not setting 1000 as batchsize?
I did it, I put 1024 as batchsize, and the answer come up quickly: the tomcat server take much more time to start and in debug log I can see lot of line with "Static select for entity ...".
What happened is that Hibernate prepared thousands of static queries, here are part of the logs for an entity:
...
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id in (?, ?)
Static select for entity Profile [PESSIMISTIC_READ]: select xxx_ with (holdlock, rowlock ) where id = ?
...
As you can see, Hibernate prepare the batch fetch requests, but not for all requests. Hibernate prepare all requests for 1,2,3....10 arguments, and then prepare only the requests with a number of args equals to batchSize/(2^n). Example, if batchSize=120 => 120, 60, 30, 15, 10, 9, 8, ..., 2, 1
So I tried to do a batch fetch of a collection with various number of elements, and results are:
For fetching 18 items, hibernate made 2 queries: one with 16 items and one with 2 items.
For fetching 16 items, hibernate made 1 query with 16 items.
For fetching 12 items, hibernate made 2 queries: one with 10 items and one with 2 items.
Hibernate only used the statement prepared at startup.
After that, I monitored the RAM usage of all this prepared statement:
with batchSize = 0 => 94 Mo (it's my reference)
batchSize = 32 => 156 Mo (+62 Mo with the reference)
batchSize = 64 => 164 Mo (+68 Mo with the reference)
batchSize = 1000 => 250 Mo (+156! Mo with the reference)
(my project is medium sized, about 300 entities)
It's now time for the conclusion:
1) The batchsize can have a big effect on startup time and memory consumption. It doesn't scale linearly with the batchsize, a batchsize of 80 will cost 2 times more than a batchsize of 10.
2) Hibernate can't retrieve collection of items with batch of any size, it only use the prepared batch queries. If you set batchSize=120, the prepared queries will be those with 120, 60, 30, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1 arguments. So if you try to fetch a collection with 220 items, 4 queries will fired: the first will retrieve 120 item, the second 60, the third 30 and the fourth 10.
This explain why the recommended batchSizes are low. I will recommend to set a low global batchSize like 20 (20 seems better to me than 16 as it will not generate more prepared queries than 16) and to set a specific bigger @BatchSize only when needed.
(I used Hibernate 5.1)
With respect to the memory / startup time concerns. Try with:
<property name="hibernate.batch_fetch_style" value="dynamic" />
Only one prepared statement with "where id = ?", but the batch fetching of the entities of the same type in the session is dynamically constructed with the limit of hibernate.default_batch_fetch_size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With