Hibernate provides (at least) two options for getting around the N+1 query problem. The one is setting the FetchMode to Subselect, which generates a select with a IN-clause and a subselect within this IN-clause. The other is to specify a BatchSize, which generates a select with a IN-clause containing the parents' IDs.
Both work but I find that the Subselect option often runs into performance problems due to the query for the parents being complex. On the other hand, with a large BatchSize (say 1000), the number of queries and complexity of those queries are very small.
My question is thus: when would you use Hibernate's Subselect FetchMode over BatchSize? Subselect probably makes sense if you have a very large number of parent entries (thousands), but are there any other scenarios where you'd prefer a Subselect to BatchSize?
EDIT: I noticed a difference between the two when dealing with eager loading. If you have an xToMany association set to be loaded eagerly and through a subselect, it generates a subselect like it would if it was lazy. If you specify a BatchSize however, the generated query makes use of a outer join instead of a seperate query. Is there any way to force Hibernate to use a seperate batched query when loading eagerly?
You can use Hibernate's @Subselect annotation to map an entity to an SQL query. In the following code snippet, I use this annotation to select the id, the title and the number of reviews of a Book and map them to the BookSummary entity.
By default, Hibernate uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for most associations in the majority of applications. If you set hibernate. default_batch_fetch_size , Hibernate will use the batch fetch optimization for lazy fetching.
If the Hibernate annotation @Fetch is not present on a field, then the default FetchMode for this field is: if this field has FetchType = EAGER, then FetchMode = JOIN. Otherwise, FetchMode = SELECT.
I don't use subselect, because it is hard to control. In a very large system which has complex business logic and a large team working on it, it is too hard to say which queries are used. Subselect may work in specific cases where you exactly know which query is performed.
Batch fetching has some big advantages. It is not always the fastest, but usually fast enough. On the other hand it is very stable, doesn't have any side effects and is completely transparent to the business logic. I never use batch values higher then 100. It is sufficient to reduce the N+1 to some reasonable amount of queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With