Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve JDBC Performance

I am executing the following set of statements in my java application. It connects to a oracle database.

stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
    obj1.setAttr1(rs.getString(1));
    obj1.setAttr2(rs.getString(1));
    obj1.setAttr3(rs.getString(1));
    obj1.setAttr4(rs.getString(1));

    ResultSet rs1 = stat1.executeQuery(SMALLQ1);
    while(rs1.next()) {
       obj1.setAttr5(rs1.getString(1));
    }

     ResultSet rs2 = stat1.executeQuery(SMALLQ2);
    while(rs2.next()) {
       obj1.setAttr6(rs2.getString(1));

    }
         .
         .
         .
 LinkedBlockingqueue.add(obj1);
 }
 //all staements and connections close

The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.

My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.

Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?

PS : I am unable to post the query here, but I am assured that all queries are optimized.

like image 715
user2133404 Avatar asked Sep 15 '14 14:09

user2133404


People also ask

Why is JDBC so slow?

Establishing a JDBC connection with a DBMS can be very slow. If your application requires database connections that are repeatedly opened and closed, this can become a significant performance issue.

Which JDBC driver is fastest?

JDBC Net pure Java driver(Type 4) is the fastest driver because it converts the JDBC calls into vendor specific protocol calls and it directly interacts with the database.

Which using JDBC API the performance of the application will be faster?

Performance of application will be faster if you use PreparedStatement interface - JDBC. Q. The performance of the application will be faster if you use PreparedStatement interface because query is compiled only once.


1 Answers

To improve JDBC performance for your scenario you can apply some modifications.

As you will see, all these modifications can significantly speed your task.

1. Using batch operations.

You can read your big query and store results in some kind of buffer. And only when buffer is full you should run subquery for all data collected in buffer. This significantly reduces number of SQL statements to execute.

static final int BATCH_SIZE = 1000; 
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);

while (rs.hasNext()) {

  MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
  buffer.add( record );

  if (buffer.size() == BATCH_SIZE) {
    processBatch( buffer );
  }  
}

void processBatch( List<MyData> buffer ) {

  String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
  stat1.executeQuery(sql);  // query for all IDs in buffer
  while(stat1.hasNext()) { ... }
  ...  
}

2. Using efficient maps to store content from many selects.

If your records are no so big you can store them all at once event for 4 mln table.

I used this approach many times for night processes (with no normal users). Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).

To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc) which is much better that java standard library maps.

final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);

// query 1
while (rs.hasNext()) {
  MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
  map.put(record.getId(), record);
}

// query 2
while (rs.hasNext()) {
  int id = rs.getInt(1);   // my data id
  String x = rs.getString(...);
  int y = rs.getInt(...);

  MyData record = map.get(id);
  record.add( new MyDetail(x,y) );
}

// query 3
// same pattern as query 2 

After this you have map filled with all data collected. Probably with a lot of memory allocated. This is why you can use that method only if you hava such resources.

Another topic is how to write MyData and MyDetail classes to be as small as possible. You can use some tricks:

  1. storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
  2. storing Date objects as integer (yymmdd)
  3. calling str.intern() for each string fetched from DB

3. Transactions

If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions. This is too much for most database configurations. Use approach 1) and commit transaction for each batch. On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.

If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.

4. Jdbc fetch size.

When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection. This reduces number of physical hits to database socket and speeds your process.

Example:

// jdbc
statement.setFetchSize(500);

// spring     
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);

Here you can find some benchmarks and patterns for using fetch size:

http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html

5. PreparedStatement

Use PreparedStatement rather than Statement.

6. Number of sql statements.

Always try to minimize number of sql statements you send to database.

like image 94
przemek hertel Avatar answered Sep 27 '22 18:09

przemek hertel