Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JDBC Query vs JPA Query Performance

I am having some issues related with performance while reading thousands of records from the database. I noticed that a pure JDBC query is much more faster that a JPA Native query.

Here is the query

select ID, COL_A, COL_B, COL_C, COL_D, COL_E, COL_F from MY_SUPER_VIEW_V v 
where 1=1 
and v.ID in (:idList)
and v.DATE_FROM <= :date
and v.DATE_TILL >= :date;

This query returns around 38.000 records.

The in idList has more than 1000 records and because I am using an Oracle DB it needs to be split in n queries.

Further i have a method that converts the Object[] result to my List<Entity>.

In order to understand the performance issue i created a pure JDBC query and a JPA Native query respectively to compare the results.

Here are the timings.

################ getScoresPureJDBCWithListIds ################
List of Ids retrieved. It took: 00:00:00.096 to execute query on DB using JDBC
It took: 00:00:01.180 to execute query on DB using JDBC query
Creating 24206 Scores records from DB result It took: 00:00:04.440
It took: 00:00:01.038 to execute query on DB using JDBC query
Creating 14445 Scores records from DB result It took: 00:00:04.307
################ getScoresJPANativeQueryWithListIds ################
It took: 00:06:09.450 to execute query on DB using JPA Native query
Creating 24206 Scores records from DB result It took: 00:00:00.009
It took: 00:04:04.879 to execute query on DB using JPA Native query
Creating 14445 Scores records from DB result It took: 00:00:00.007

With Hibernate analytics

################ USING FETCH_SIZE: 2000 ################
################ getScoresPureJDBCWithListIds ################
List of <elements> retrieved. It took: 00:00:00.296 to execute query on DB using JDBC
It took: 00:00:11.940 to execute query on DB using JDBC query
Creating 24206 records from DB result It took: 00:00:02.670
It took: 00:00:13.570 to execute query on DB using JDBC query
Creating 14445 records from DB result It took: 00:00:02.553

################ getScoresJDBCTemplateWithListIds ################
    List of <elements> retrieved. It took: 00:00:00.087 to execute query on DB using JDBC
    Creating 24206 records from DB result It took: 00:00:04.063
    Creating 14445 records from DB result It took: 00:00:04.064
    ################ getScoresJPANativeQueryAsApplication with hint fetch size 2000 ################
    2020-04-22 09:36:30.830  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
        1232369 nanoseconds spent acquiring 1 JDBC connections;
        0 nanoseconds spent releasing 0 JDBC connections;
        1448702 nanoseconds spent preparing 1 JDBC statements;
        3992364 nanoseconds spent executing 1 JDBC statements;
        0 nanoseconds spent executing 0 JDBC batches;
        0 nanoseconds spent performing 0 L2C puts;
        0 nanoseconds spent performing 0 L2C hits;
        0 nanoseconds spent performing 0 L2C misses;
        0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
        0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
    }
    List of <ids> retrieved. It took: 00:00:00.261 to execute query on DB using JDBC
    2020-04-22 09:47:23.739  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
        73670 nanoseconds spent acquiring 1 JDBC connections;
        0 nanoseconds spent releasing 0 JDBC connections;
        805772 nanoseconds spent preparing 1 JDBC statements;
        651947762290 nanoseconds spent executing 1 JDBC statements; ==> 10 minutes
        0 nanoseconds spent executing 0 JDBC batches;
        0 nanoseconds spent performing 0 L2C puts;
        0 nanoseconds spent performing 0 L2C hits;
        0 nanoseconds spent performing 0 L2C misses;
        0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
        0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
    }
    It took: 00:10:52.898 to execute query on DB using JPA Native query
    Creating 24206 records from DB result It took: 00:00:00.018
    2020-04-22 09:56:00.792  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
        2758010 nanoseconds spent acquiring 1 JDBC connections;
        0 nanoseconds spent releasing 0 JDBC connections;
        3096653 nanoseconds spent preparing 1 JDBC statements;
        516148003151 nanoseconds spent executing 1 JDBC statements;
        0 nanoseconds spent executing 0 JDBC batches;
        0 nanoseconds spent performing 0 L2C puts;
        0 nanoseconds spent performing 0 L2C hits;
        0 nanoseconds spent performing 0 L2C misses;
        0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
        0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
    }
    It took: 00:08:37.032 to execute query on DB using JPA Native query
    Creating 14445 records from DB result It took: 00:00:00.006

For the JDBC query I can see 1) that executing the query is quite fast, but 2) processing each ResultSet element in a loop takes the most of the time 00:09 seconds int total

On the other for the JPA Native query 1) executing the query by calling the query.getResultList() method takes a lot of time 10:14 seconds on the other hand 2) processing each result is quite fast here. Analytics shows that a huge amount of time is spent on executing 1 JDBC statement. Even with FETCH_SIZE = 2000 nothing changed significantly.

Why JPA Native is quite slow when compare with pure JDBC? Would it be the type conversions? In my case I am talking about varchar2 and numbers. I was expecting identical results to JDBC. but from 8 seconds to 10mins its a lot.

What can I do to improve the JPA Native query?

like image 778
mpssantos Avatar asked Apr 20 '20 16:04

mpssantos


2 Answers

You seem to compare two different queries, quite possibly resulting in the database coming up with different query plans.

There are lots of ways to investigate the problem but none of those are available to us because you don't provide a minimal reproducible example. I'll therefore suggest some options for you to investigate this yourself:

  • Enable debug logging for your Java application including Hibernate and Oracle JDBC driver as explained in their documentation
  • Watch where the delay is coming from, is it the database, the network, or your Java application? If in doubt check the network traffic with Wireshark on both sides of the connection or check Oracles database statistics on slow/heavy queries before and after the problematic queries
  • If the problem is a slow database make sure your query parameters have matching types with your database index
  • If you are sure the network and database are not causing the issue and debug logging doesn't help you further try using advanced tools like a cpu profiler with e.g. JVisualVM
  • If you're still having problems maybe you have some extreme memory problem like to little system memory causing swapping or very frequent Full Garbage Collection which you can see from Garbage Collection logging
like image 123
JohannesB Avatar answered Oct 08 '22 07:10

JohannesB


Please note, that if you want to compare two concepts you must try to isolate the main feature and get rid of other factors, that may disturb the result.

So to see if the JDBC query and JPA native query differes in behaviour I'd propose following scenario:

  • use only one query with the 1000 element list

  • use a plain table instead of a view

Here a simple setup to validate a performance. The table has 50 rows for each GRP_ID resulting in getting 50K rows for 1000 keys (see below the script to setup the table)

List params = (13001L..14000L)
def query = session.createNativeQuery("select * from tab where grp_id in (:paramsList) ")
query.setFetchSize(2000)
query.setParameterList("paramsList", params);
result = query.getResultList();

Sample run shows this result

 got 50000 rows in 1.388 seconds

So I thing there is no need to repeat the test with plain JDBC you will see a comparable result.

What is more interesting is to repeat the run and remove the line

query.setFetchSize(2000)

which will effectivelly reset the fetch size to the default (was 20 in my case), the result for the same data is

 got 50000 rows in 1 minutes, 0.903 seconds

1) So the fetch size is the most propable explanation of the observed behaviour. The important thing is to check, if the JDBC drive got the right value and uses it - in doubt you must use the 10046 trace to see what fetch size uses the database. But for me the above statement worked perfectly.

2) There is no substantial difference between a native JPA query and a manually written JDBC execute + fetch of a prepared statement that would explain your observation. Both perform execute of the statement in the database followed by a number of fetches - the count depend on the used fetch size

3) Of course the view can also have influence, but it will be a difference in the query - not betwen the JDBC v. JPA.

4) You didn't mention it, so I'm not going in details here and assume your view doesn't contain any CLOB columns. This could of course play a role.

5) The last point is in your mention of two queries - do you use two independent queries or one query whith OR concatenated IN list? You din not provide details so it's hard to comment. Anyway two independent queries shoudl have no influence.

Having said that one word of warning.

The limitation of the IN list count has its purpose. It is acceptable for an ad Hoc script to use a large IN list selection, but for a regular running query this could be a parsing problem. Why?

You use a bind variables to be able to consider the following quereis as a single statament (that is parsed only once)

select * from tab where ID = 1
select * from tab where ID = 2

which leads to

select * from tab where ID = ?

But following two queries (with different length of the IN list) remains different and must be each extra parsed

select * from tab where ID in ( ? )
select * from tab where ID in ( ?, ? )

So re-thing if for your purpose with 30K rows+ the Hibernate is the best option

Hibernate was designed to elegantly get gid of the need of using SQL which is by majority of developers considered a cool think (contrary to the majority of the DB people that have an opposite meaning;).

This concept works fine, the simple the use case is the better. On the other side for batch processing it is sometimes better to approach it directly with SQL

Test Data

create table tab as 
select 
rownum id,
trunc(rownum /  50) +1 grp_id,
rpad('x',100,'y') pad
from dual connect by level <= 1000000;
create index idx on tab(grp_id);
like image 42
Marmite Bomber Avatar answered Oct 08 '22 06:10

Marmite Bomber