Summary :
I am trying to minimize the number of queries my JPA based java application makes to the database. I specified the @BatchFetch(BatchFetchType.IN)
optimization hint, but I still see some extra queries that I would consider as redundant and unnecessary.
Details:
Considering a simple domain model: We have Invoice mgmt system. Invoice has OneToOne relationship with Order. We also have customer, which has OneToMany relationship with Orders. (Customer 1->M Order 1<-1 Invoice). Find more details here. Find the complete source code here. Here is the entity definition, as it currently stands:
Client.java (Excluding getters and setter):
@Entity(name = "CUSTOMER")
public class Customer {
@Id //signifies the primary key
@Column(name = "CUST_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long custId;
@Column(name = "FIRST_NAME", length = 50)
private String firstName;
@OneToMany(mappedBy="customer",targetEntity=Order.class,
fetch=FetchType.LAZY)
private Collection<Order> orders;
}
Order.java (Excluding getters and setter):
@Entity(name = "ORDERS")
public class Order {
@Id
@Column(name = "ORDER_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long orderId;
@Column(name = "TOTAL_PRICE", precision = 2)
private double totPrice;
@OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order")
private Invoice invoice;
@ManyToOne(optional = false)
@JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID")
private Customer customer;
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID"))
private List<Product> productList;
}
Invoice.java (Excluding getters and setter):
@Entity(name = "ORDER_INVOICE")
public class Invoice {
@Id
// signifies the primary key
@Column(name = "INVOICE_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long invoiceId;
@Column(name = "AMOUNT_DUE", precision = 2)
private double amountDue;
@OneToOne(optional = false, fetch = FetchType.LAZY)
@JoinColumn(name = "ORDER_ID")
private Order order;
}
With this model in place, I ran a simple test to fetch all the orders of a customer.
EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("testjpa");
EntityManager em = entityManagerFactory.createEntityManager();
Customer customer = em.find(Customer.class, 100L);
Collection<Order> orders = customer.getOrders();
for(Order order: orders){
System.out.println(order.getInvoice().getInvoiceId());
}
em.close();
Since everything was lazy fetched, we got four queries, as shown:
1398882535950|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100)
1398882535981|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100)
1398882535995|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 111)
1398882536004|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222)
Since I don't want N+1 calls for getting Invoices, I thought of using batch fetch and reduce the total queries to 4 (One query to fetch Invoices for all the Orders of Customer). To do the same, I updated my Order entity, as shown:
Updated - Order.java, Adding BatchFetch for Invoice. (Excluding getters and setter):
@Entity(name = "ORDERS")
public class Order {
@Id
@Column(name = "ORDER_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long orderId;
@Column(name = "TOTAL_PRICE", precision = 2)
private double totPrice;
@BatchFetch(BatchFetchType.IN)
@OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order")
private Invoice invoice;
@ManyToOne(optional = false)
@JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID")
private Customer customer;
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID"))
private List<Product> productList;
}
I ran the same test again, and was assuming there will be 3 queries to fetch the data.(One for customer, One for Orders and One to batch fetch Invoices). However, eclipselink generates 5 queries for the same. Here are the queries:
1398883197009|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100)
1398883197030|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100)
1398883197037|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (?,?))|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (111,222))
1398883197042|1|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = 222)
1398883197045|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222)
I didn't understand why the last two queries are being generated. Any help to explain what's going on would be helpful.
Thanks!
Looks like a bug/problem in EclipseLink due to the traversal of eager relationships in the object model that allows the loading of the second Invoice in the "in" before the Order that reference it is loaded. This forces the Invoice to query for the Order in the database, instead of finding it in the cache.
You can fix this by using lazy fetching on the Invoice to Order relationship. This delay will allow EclipseLink to fully build the object model, so that it will be in the cache when it is accessed. The code in the question shows this relationship is marked as Lazy, but this is only a hint to JPA providers that cannot work In EclipseLink without the use of an agent or byte code weaving as described here: https://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Performance/Weaving https://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Performance/Weaving/Dynamic_Weaving
Weaving is not required for lazy collections, only for 1:1 and other optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With