I am executing the following set of statements in my java application. It connects to a oracle database. <pre class="prettyprint"><code>stat=connection.createStatement(); stat1=commection.createstatement(); ResultSet rs = stat.executeQuery(BIGQUERY); while(rs.next()) { obj1.setAttr1(rs.getString(1)); obj1.setAttr2(rs.getString(1)); obj1.setAttr3(rs.getString(1)); obj1.setAttr4(rs.getString(1)); ResultSet rs1 = stat1.executeQuery(SMALLQ1); while(rs1.next()) { obj1.setAttr5(rs1.getString(1)); } ResultSet rs2 = stat1.executeQuery(SMALLQ2); while(rs2.next()) { obj1.setAttr6(rs2.getString(1)); } . . . LinkedBlockingqueue.add(obj1); } //all staements and connections close </code></pre> The <code>BIGQUERY</code> returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements. My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue. Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets? PS : I am unable to post the query here, but I am assured that all queries are optimized.

To improve JDBC performance for your scenario you can apply some modifications. As you will see, all these modifications can significantly speed your task. 1. Using batch operations. You can read your big query and store results in some kind of buffer. And only when buffer is full you should run subquery for all data collected in buffer. This significantly reduces number of SQL statements to execute. <pre class="prettyprint"><code>static final int BATCH_SIZE = 1000; List<MyData> buffer = new ArrayList<>(BATCH_SIZE); while (rs.hasNext()) { MyData record = new MyData( rs.getString(1), ..., rs.getString(4) ); buffer.add( record ); if (buffer.size() == BATCH_SIZE) { processBatch( buffer ); } } void processBatch( List<MyData> buffer ) { String sql = "select ... where X and id in (" + getIDs(buffer) + ")"; stat1.executeQuery(sql); // query for all IDs in buffer while(stat1.hasNext()) { ... } ... } </code></pre> 2. Using efficient maps to store content from many selects. If your records are no so big you can store them all at once event for 4 mln table. I used this approach many times for night processes (with no normal users). Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1). To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc) which is much better that java standard library maps. <pre class="prettyprint"><code>final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f); // query 1 while (rs.hasNext()) { MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) ); map.put(record.getId(), record); } // query 2 while (rs.hasNext()) { int id = rs.getInt(1); // my data id String x = rs.getString(...); int y = rs.getInt(...); MyData record = map.get(id); record.add( new MyDetail(x,y) ); } // query 3 // same pattern as query 2 </code></pre> After this you have map filled with all data collected. Probably with a lot of memory allocated. This is why you can use that method only if you hava such resources. Another topic is how to write MyData and MyDetail classes to be as small as possible. You can use some tricks: <ol> <li>storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)</li> <li>storing Date objects as integer (yymmdd)</li> <li>calling str.intern() for each string fetched from DB </li> </ol> 3. Transactions If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions. This is too much for most database configurations. Use approach 1) and commit transaction for each batch. On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful. If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database. 4. Jdbc fetch size. When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection. This reduces number of physical hits to database socket and speeds your process. Example: <pre class="prettyprint"><code>// jdbc statement.setFetchSize(500); // spring JdbcTemplate jdbc = new JdbcTemplate(datasource); jdbc.setFetchSize(500); </code></pre> Here you can find some benchmarks and patterns for using fetch size: http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html 5. PreparedStatement Use PreparedStatement rather than Statement. 6. Number of sql statements. Always try to minimize number of sql statements you send to database.

Improve JDBC Performance

Tags:

java

sql

multithreading

jdbc

I am executing the following set of statements in my java application. It connects to a oracle database.

stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
    obj1.setAttr1(rs.getString(1));
    obj1.setAttr2(rs.getString(1));
    obj1.setAttr3(rs.getString(1));
    obj1.setAttr4(rs.getString(1));

    ResultSet rs1 = stat1.executeQuery(SMALLQ1);
    while(rs1.next()) {
       obj1.setAttr5(rs1.getString(1));
    }

     ResultSet rs2 = stat1.executeQuery(SMALLQ2);
    while(rs2.next()) {
       obj1.setAttr6(rs2.getString(1));

    }
         .
         .
         .
 LinkedBlockingqueue.add(obj1);
 }
 //all staements and connections close

The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.

My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.

Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?

PS : I am unable to post the query here, but I am assured that all queries are optimized.

715

asked Sep 15 '14 14:09

user2133404

1 Answers

To improve JDBC performance for your scenario you can apply some modifications.

As you will see, all these modifications can significantly speed your task.

1. Using batch operations.

You can read your big query and store results in some kind of buffer. And only when buffer is full you should run subquery for all data collected in buffer. This significantly reduces number of SQL statements to execute.

static final int BATCH_SIZE = 1000; 
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);

while (rs.hasNext()) {

  MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
  buffer.add( record );

  if (buffer.size() == BATCH_SIZE) {
    processBatch( buffer );
  }  
}

void processBatch( List<MyData> buffer ) {

  String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
  stat1.executeQuery(sql);  // query for all IDs in buffer
  while(stat1.hasNext()) { ... }
  ...  
}

2. Using efficient maps to store content from many selects.

If your records are no so big you can store them all at once event for 4 mln table.

I used this approach many times for night processes (with no normal users). Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).

To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc) which is much better that java standard library maps.

final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);

// query 1
while (rs.hasNext()) {
  MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
  map.put(record.getId(), record);
}

// query 2
while (rs.hasNext()) {
  int id = rs.getInt(1);   // my data id
  String x = rs.getString(...);
  int y = rs.getInt(...);

  MyData record = map.get(id);
  record.add( new MyDetail(x,y) );
}

// query 3
// same pattern as query 2

After this you have map filled with all data collected. Probably with a lot of memory allocated. This is why you can use that method only if you hava such resources.

Another topic is how to write MyData and MyDetail classes to be as small as possible. You can use some tricks:

storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB

3. Transactions

If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions. This is too much for most database configurations. Use approach 1) and commit transaction for each batch. On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.

If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.

4. Jdbc fetch size.

When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection. This reduces number of physical hits to database socket and speeds your process.

Example:

// jdbc
statement.setFetchSize(500);

// spring     
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);

Here you can find some benchmarks and patterns for using fetch size:

http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html

5. PreparedStatement

Use PreparedStatement rather than Statement.

6. Number of sql statements.

Always try to minimize number of sql statements you send to database.

answered Sep 27 '22 18:09

przemek hertel

Related questions
                            
                                JSON - Unable to serialize JSONObject within Object using Jackson
                            
                                remove a selected row from jtable on button click
                            
                                How do you specify the JMH microbenchmarks test to run WITHOUT using a resource file?
                            
                                JDOM Is using too much memory
                            
                                Invoking method on a Java class from lotus script (LS2J)
                            
                                Why do POJO classes need to implement the Serializable interface?
                            
                                ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options
                            
                                Arrays.sort() -- two different strategies for primitive & complex data types to be sorted
                            
                                Spark with Cassandra input/output
                            
                                How to return an Enum using the addScalar() method with Hibernate
                            
                                Weld not initializing properly
                            
                                GridBagLayout: How to set fixed column width?
                            
                                maven with eclipse error "Path must include project and resource name"
                            
                                Cookie domain and path with jax-rs
                            
                                Solving 8-Puzzle using DFS
                            
                                Can MyBatis create the database schema?
                            
                                How to set printer margin in java
                            
                                Hibernate JPA IdentifierGenerationException: null id generated for class with @embeddedid
                            
                                prevent rounding off of timestamp in mysql
                            
                                java.lang.String cannot be cast to org.json.simple.JSONObject simple-json

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With