Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Data Query Execution Optimization: Parallel Execution of Hibernate @Query Method in JpaRepository

I have a Dashboard view, which requires small sets of data from tables all over the database. I optimized the database queries (e.g. removed sub-queries). There are now ~20 queries which are executed one after the other, and which are fetching different data sets from the database. Most of the HQL queries contain GROUP BY and JOIN clauses. With a Spring REST interface, the result is returned to the front-end.

How do I optimize the execution of the custom queries? My initial thought was to run the database queries in parallel. But how do I achieve that? After doing some research I found the annotation @Async which makes it possible to run methods in parallel. But does this work with Hibernate methods? Is there always a new database session created for every method annotated with @Query in a JpaRepository? Does running a database query have an effect on the overall execution time after all?

Another way to run the database calls in parallel, is splitting the Dashboard call into several single Ajax calls (every concern gets its own Ajax call). I didn't want to do that, because every time the dashboard is opened (or e.g. the date range is changed), another 20 Ajax calls are made to fetch the new data. And the same question remains: Does running SQL queries in parallel have an effect on the execution time of the database?

I currently did not yet add additional indices to the database. This will be the next thing, I definitely will be doing. However, I'm interested on the performance impacts of running the queries in parallel and on how to achieve this programmatically with Spring.

My project was initially generated by jHipster (Spring Boot, MariaDB, AngularJS etc.)

like image 819
ssc-hrep3 Avatar asked Aug 14 '16 07:08

ssc-hrep3


2 Answers

First, running these SQLs in parallel will not impact the database and it will only make the page load faster, so the design should focus on that.

I am posting this answer assuming that you have already made sure that you cannot combine these 20 SQLs because the data is unrelated (no joins, views, etc).

I would advise against using @Async for 2 reasons.

Reason 1 - An asynchronous task is great when you want to fire a bunch of tasks and forget, or when you know when all the tasks will be complete. So you will need to "wait" for all your asynchronous tasks to complete. How long should you wait? Until the slowest query is done?

Check this sample code for Async (from the guides @ spring.io --https://spring.io/guides/gs/async-method/)

// Wait until they are all done
while (!(page1.isDone() && page2.isDone() && page3.isDone())) {
     Thread.sleep(10); //10-millisecond pause between each check
}

Will/should your service component wait on 20 Async DAO queries?

Reason 2 - Remember that Async is just spawning off the task as a thread. Since you are going to work with JPA, remember Entity managers are not thread-safe. And DAO classes will propagate transactions. Here is an example of problems that may crop up - http://alexgaddie.blogspot.com/2011/04/spring-3-async-with-hibernate-and.html

IMHO, it is better to go ahead with multiple Ajax calls, because that will make your components cohesive. Yes, you will have 20 endpoints, but they would have a simpler DAO, simpler SQL, easily unit testable and the returned data structure will be easier to handle/parse by the AngularJS widgets. When the UI triggers all 20 Ajax calls, the dashboard would be loading individual widgets when they are ready, instead of loading all of them at the same time. This will help you extend your design in future by optimizing the slower loading sections of your dashboard (maybe caching, indexing, etc).

Bunching your DAO calls will only make the data structure complex and unit testing harder.

like image 79
Shankar P S Avatar answered Sep 22 '22 14:09

Shankar P S


Normally it will be much faster to execute the queries in parallel. If you are using Spring data and do not configure anything specific your JPA provider (Hibernate) will create a connection pool that stores connections to your data base. I think by default Hibernate holds 10 connections and by doing so it is prepared to do 10 queries in parallel. How much faster the queries are by running them in parallel depends on the database and the structure of the tables / queries. I think that using @Async is not the best practice here. Defining 20 REST endpoints that provides the result of a specific query is a much better approach. By doing so you can simple create the Entity, Repository and RestEndpoint class for each query. By doing so each query is isolated and the code is less complex.

like image 28
Hendrik Ebbers Avatar answered Sep 18 '22 14:09

Hendrik Ebbers