I am having some issues related with performance while reading thousands of records from the database. I noticed that a pure JDBC query is much more faster that a JPA Native query.
Here is the query
select ID, COL_A, COL_B, COL_C, COL_D, COL_E, COL_F from MY_SUPER_VIEW_V v
where 1=1
and v.ID in (:idList)
and v.DATE_FROM <= :date
and v.DATE_TILL >= :date;
This query returns around 38.000 records.
The in idList has more than 1000 records and because I am using an Oracle DB it needs to be split in n queries.
Further i have a method that converts the Object[] result to my List<Entity>
.
In order to understand the performance issue i created a pure JDBC query and a JPA Native query respectively to compare the results.
Here are the timings.
################ getScoresPureJDBCWithListIds ################
List of Ids retrieved. It took: 00:00:00.096 to execute query on DB using JDBC
It took: 00:00:01.180 to execute query on DB using JDBC query
Creating 24206 Scores records from DB result It took: 00:00:04.440
It took: 00:00:01.038 to execute query on DB using JDBC query
Creating 14445 Scores records from DB result It took: 00:00:04.307
################ getScoresJPANativeQueryWithListIds ################
It took: 00:06:09.450 to execute query on DB using JPA Native query
Creating 24206 Scores records from DB result It took: 00:00:00.009
It took: 00:04:04.879 to execute query on DB using JPA Native query
Creating 14445 Scores records from DB result It took: 00:00:00.007
With Hibernate analytics
################ USING FETCH_SIZE: 2000 ################
################ getScoresPureJDBCWithListIds ################
List of <elements> retrieved. It took: 00:00:00.296 to execute query on DB using JDBC
It took: 00:00:11.940 to execute query on DB using JDBC query
Creating 24206 records from DB result It took: 00:00:02.670
It took: 00:00:13.570 to execute query on DB using JDBC query
Creating 14445 records from DB result It took: 00:00:02.553
################ getScoresJDBCTemplateWithListIds ################
List of <elements> retrieved. It took: 00:00:00.087 to execute query on DB using JDBC
Creating 24206 records from DB result It took: 00:00:04.063
Creating 14445 records from DB result It took: 00:00:04.064
################ getScoresJPANativeQueryAsApplication with hint fetch size 2000 ################
2020-04-22 09:36:30.830 INFO 13262 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics {
1232369 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
1448702 nanoseconds spent preparing 1 JDBC statements;
3992364 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
List of <ids> retrieved. It took: 00:00:00.261 to execute query on DB using JDBC
2020-04-22 09:47:23.739 INFO 13262 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics {
73670 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
805772 nanoseconds spent preparing 1 JDBC statements;
651947762290 nanoseconds spent executing 1 JDBC statements; ==> 10 minutes
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
It took: 00:10:52.898 to execute query on DB using JPA Native query
Creating 24206 records from DB result It took: 00:00:00.018
2020-04-22 09:56:00.792 INFO 13262 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics {
2758010 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
3096653 nanoseconds spent preparing 1 JDBC statements;
516148003151 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
It took: 00:08:37.032 to execute query on DB using JPA Native query
Creating 14445 records from DB result It took: 00:00:00.006
For the JDBC query I can see 1) that executing the query is quite fast, but 2) processing each ResultSet element in a loop takes the most of the time 00:09 seconds int total
On the other for the JPA Native query 1) executing the query by calling the query.getResultList() method takes a lot of time 10:14 seconds on the other hand 2) processing each result is quite fast here. Analytics shows that a huge amount of time is spent on executing 1 JDBC statement. Even with FETCH_SIZE = 2000 nothing changed significantly.
Why JPA Native is quite slow when compare with pure JDBC? Would it be the type conversions? In my case I am talking about varchar2 and numbers. I was expecting identical results to JDBC. but from 8 seconds to 10mins its a lot.
What can I do to improve the JPA Native query?
You seem to compare two different queries, quite possibly resulting in the database coming up with different query plans.
There are lots of ways to investigate the problem but none of those are available to us because you don't provide a minimal reproducible example. I'll therefore suggest some options for you to investigate this yourself:
Please note, that if you want to compare two concepts you must try to isolate the main feature and get rid of other factors, that may disturb the result.
So to see if the JDBC query and JPA native query differes in behaviour I'd propose following scenario:
use only one query with the 1000 element list
use a plain table instead of a view
Here a simple setup to validate a performance. The table has 50 rows for each GRP_ID
resulting in getting 50K rows for 1000 keys (see below the script to setup the table)
List params = (13001L..14000L)
def query = session.createNativeQuery("select * from tab where grp_id in (:paramsList) ")
query.setFetchSize(2000)
query.setParameterList("paramsList", params);
result = query.getResultList();
Sample run shows this result
got 50000 rows in 1.388 seconds
So I thing there is no need to repeat the test with plain JDBC you will see a comparable result.
What is more interesting is to repeat the run and remove the line
query.setFetchSize(2000)
which will effectivelly reset the fetch size to the default (was 20 in my case), the result for the same data is
got 50000 rows in 1 minutes, 0.903 seconds
1) So the fetch size is the most propable explanation of the observed behaviour. The important thing is to check, if the JDBC drive got the right value and uses it - in doubt you must use the 10046 trace to see what fetch size uses the database. But for me the above statement worked perfectly.
2) There is no substantial difference between a native JPA query and a manually written JDBC execute + fetch of a prepared statement that would explain your observation. Both perform execute of the statement in the database followed by a number of fetches - the count depend on the used fetch size
3) Of course the view can also have influence, but it will be a difference in the query - not betwen the JDBC v. JPA.
4) You didn't mention it, so I'm not going in details here and assume your view doesn't contain any CLOB
columns. This could of course play a role.
5) The last point is in your mention of two queries - do you use two independent queries or one query whith OR
concatenated IN list? You din not provide details so it's hard to comment. Anyway two independent queries shoudl have no influence.
Having said that one word of warning.
The limitation of the IN list count has its purpose. It is acceptable for an ad Hoc script to use a large IN list selection, but for a regular running query this could be a parsing problem. Why?
You use a bind variables to be able to consider the following quereis as a single statament (that is parsed only once)
select * from tab where ID = 1
select * from tab where ID = 2
which leads to
select * from tab where ID = ?
But following two queries (with different length of the IN list) remains different and must be each extra parsed
select * from tab where ID in ( ? )
select * from tab where ID in ( ?, ? )
So re-thing if for your purpose with 30K rows+ the Hibernate is the best option
Hibernate was designed to elegantly get gid of the need of using SQL which is by majority of developers considered a cool think (contrary to the majority of the DB people that have an opposite meaning;).
This concept works fine, the simple the use case is the better. On the other side for batch processing it is sometimes better to approach it directly with SQL
Test Data
create table tab as
select
rownum id,
trunc(rownum / 50) +1 grp_id,
rpad('x',100,'y') pad
from dual connect by level <= 1000000;
create index idx on tab(grp_id);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With