I have two tables Employee and Department following are the entity classes for both of them
Department.java
@Entity
@Table(name = "DEPARTMENT")
public class Department {
@Id
@Column(name = "DEPARTMENT_ID")
@GeneratedValue(strategy = GenerationType.AUTO)
private Integer departmentId;
@Column(name = "DEPARTMENT_NAME")
private String departmentName;
@Column(name = "LOCATION")
private String location;
@OneToMany(cascade = CascadeType.ALL, mappedBy = "department", orphanRemoval = true)
@Fetch(FetchMode.SUBSELECT)
//@Fetch(FetchMode.JOIN)
private List<Employee> employees = new ArrayList<>();
}
Employee.java
@Entity
@Table(name = "EMPLOYEE")
public class Employee {
@Id
@SequenceGenerator(name = "emp_seq", sequenceName = "seq_employee")
@GeneratedValue(generator = "emp_seq")
@Column(name = "EMPLOYEE_ID")
private Integer employeeId;
@Column(name = "EMPLOYEE_NAME")
private String employeeName;
@ManyToOne
@JoinColumn(name = "DEPARTMENT_ID")
private Department department;
}
Below are the queries fired when I did em.find(Department.class, 1);
-- fetch mode = fetchmode.join
SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
department0_.DEPARTMENT_NAME AS DEPARTMENT_NAME2_0_0_,
department0_.LOCATION AS LOCATION3_0_0_,
employees1_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_1_,
employees1_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_1_,
employees1_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_2_,
employees1_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_2_,
employees1_.EMPLOYEE_NAME AS EMPLOYEE_NAME2_1_2_
FROM DEPARTMENT department0_
LEFT OUTER JOIN EMPLOYEE employees1_
ON department0_.DEPARTMENT_ID =employees1_.DEPARTMENT_ID
WHERE department0_.DEPARTMENT_ID=?
-- fetch mode = fetchmode.subselect
SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
department0_.DEPARTMENT_NAME AS DEPARTMENT_NAME2_0_0_,
department0_.LOCATION AS LOCATION3_0_0_
FROM DEPARTMENT department0_
WHERE department0_.DEPARTMENT_ID=?
SELECT employees0_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_0_,
employees0_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_0_,
employees0_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_1_,
employees0_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_1_,
employees0_.EMPLOYEE_NAME AS EMPLOYEE_NAME2_1_1_
FROM EMPLOYEE employees0_
WHERE employees0_.DEPARTMENT_ID=?
I just wanted to know which one should we prefer FetchMode.JOIN
or FetchMode.SUBSELECT
? which one should we opt in which scenario?
The FetchType. LAZY tells Hibernate to only fetch the related entities from the database when you use the relationship. This is a good idea in general because there's no reason to select entities you don't need for your uses case. You can see an example of a lazily fetched relationship in the following code snippets.
If the Hibernate annotation @Fetch is not present on a field, then the default FetchMode for this field is: if this field has FetchType = EAGER, then FetchMode = JOIN. Otherwise, FetchMode = SELECT.
The SUBQUERY strategy that Marmite refers to is related to FetchMode.SELECT, not SUBSELECT.
The console output that you've posted about fetchmode.subselect is curious because this is not the way that is supposed to work.
The FetchMode.SUBSELECT
use a subselect query to load the additional collections
Hibernate docs:
If one lazy collection or single-valued proxy has to be fetched, Hibernate will load all of them, re-running the original query in a subselect. This works in the same way as batch-fetching but without the piecemeal loading.
FetchMode.SUBSELECT should look something like this:
SELECT <employees columns>
FROM EMPLOYEE employees0_
WHERE employees0_.DEPARTMENT_ID IN
(SELECT department0_.DEPARTMENT_ID FROM DEPARTMENT department0_)
You can see that this second query will bring to memory all the employees that belongs to some departament (i.e. employee.department_id is not null), it doesn't matter if it is not the department that you retrieve in your first query. So this is potentially a major issue if the table of employees is large because it may be accidentially loading a whole database into memory.
However, FetchMode.SUBSELECT reduces significatly the number of queries because takes only two queries in comparisson to the N+1 queries of the FecthMode.SELECT.
You may be thinking that FetchMode.JOIN makes even less queries, just 1, so why use SUBSELECT at all? Well, it's true but at the cost of duplicated data and a heavier response.
If a single-valued proxy has to be fetched with JOIN, the query may retrieve:
+---------------+---------+-----------+
| DEPARTMENT_ID | BOSS_ID | BOSS_NAME |
+---------------+---------+-----------+
| 1 | 1 | GABRIEL |
| 2 | 1 | GABRIEL |
| 3 | 2 | ALEJANDRO |
+---------------+---------+-----------+
The employee data of the boss is duplicated if he directs more than one department and it has a cost in bandwith.
If a lazy collection has to be fetched with JOIN, the query may retrieve:
+---------------+---------------+-------------+
| DEPARTMENT_ID | DEPARTMENT_ID | EMPLOYEE_ID |
+---------------+---------------+-------------+
| 1 | Sales | GABRIEL |
| 1 | Sales | ALEJANDRO |
| 2 | RRHH | DANILO |
+---------------+---------------+-------------+
The department data is duplicated if it contains more than one employee (the natural case). We don't only suffer a cost in bandwidth but also we get duplicate duplicated Department objects and we must use a SET or DISTINCT_ROOT_ENTITY to de-duplicate.
However, duplicate data in pos of a lower latency is a good trade off in many cases, like Markus Winand says.
An SQL join is still more efficient than the nested selects approach—even though it performs the same index lookups—because it avoids a lot of network communication. It is even faster if the total amount of transferred data is bigger because of the duplication of employee attributes for each sale. That is because of the two dimensions of performance: response time and throughput; in computer networks we call them latency and bandwidth. Bandwidth has only a minor impact on the response time but latencies have a huge impact. That means that the number of database round trips is more important for the response time than the amount of data transferred.
So, the main issue about using SUBSELECT is that is hard to control and may be loading a whole graph of entities into memory. With Batch fetching you fetch the associated entity in a separate query as SUBSELECT (so you don't suffer duplicates), gradually and most important you query only related entities (so you don't suffer from potentially load a huge graph) because the IN subquery is filtered by the IDs retrieved by the outter query).
Hibernate:
select ...
from mkyong.stock stock0_
Hibernate:
select ...
from mkyong.stock_daily_record stockdaily0_
where
stockdaily0_.STOCK_ID in (
?, ?, ?, ?, ?, ?, ?, ?, ?, ?
)
(It may be interesting test if Batch fetching with a very high batch size would act like a SUBSELECT but without the issue of load the whole table)
A couple of posts showing the different fetching strategies and the SQL logs (very important):
Summary:
The tables were built using ascii-tables.
I'd say it depends...
Let assume you have N employees in a department, that contains D bytes of information and an average employee consist of E bytes. (Bytes are sum of the attribute length with some overhead).
Using the join strategy you perform 1 query and transfers N * (D + E) data.
Using the subquery strategy you perform 1 + N queries, but transfers only D + N*E data.
Typically the N+1 query is the NO GO if the N is large, so the JOIN is preferred.
But actually you must check your mileage between number of queries and data transfer.
Note that I'm not considering other aspects as Hibernate caching.
Additional subtle aspect could be valid if the employee table is large and partitioned - partition pruning on the index access comes to the consideration as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With