Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk insert in Java using prepared statements batch update

I am trying to fill a resultSet in Java with about 50,000 rows of 10 columns and then inserting them into another table using the batchExecute method of PreparedStatement.

To make the process faster I did some research and found that while reading data into resultSet the fetchSize plays an important role.

Having a very low fetchSize can result into too many trips to the server and a very high fetchSize can block the network resources, so I experimented a little bit and set up an optimum size that suits my infrastructure.

I am reading this resultSet and creating insert statements to insert into another table of a different database.

Something like this (just a sample, not real code):

for (i=0 ; i<=50000 ; i++) {     statement.setString(1, "[email protected]");     statement.setLong(2, 1);     statement.addBatch(); } statement.executeBatch(); 
  • Will the executeBatch method try to send all the data at once ?
  • Is there a way to define the batch size?
  • Is there any better way to speed up the process of bulk insertion?

While updating in bulk (50,000 rows 10 cols), is it better to use a updatable ResultSet or PreparedStaement with batch execution?

like image 354
Mrinmoy Avatar asked Jul 31 '11 20:07

Mrinmoy


People also ask

Can we use PreparedStatement for update query?

The Statement. executeUpdate method works if you update data server tables with constant values. However, updates often need to involve passing values in variables to the tables. To do that, you use the PreparedStatement.

What is batch in PreparedStatement?

You can load batches of data into Vertica using prepared INSERT statements—server-side statements that you set up once, and then call repeatedly. You instantiate a member of the PreparedStatement class with a SQL statement that contains question mark placeholders for data.


1 Answers

I'll address your questions in turn.

  • Will the executeBatch method tries to send all the data at once?

This can vary with each JDBC driver, but the few I've studied will iterate over each batch entry and send the arguments together with the prepared statement handle each time to the database for execution. That is, in your example above, there would 50,000 executions of the prepared statement with 50,000 pairs of arguments, but these 50,000 steps can be done in a lower-level "inner loop," which is where the time savings come in. As a rather stretched analogy, it's like dropping out of "user mode" down into "kernel mode" and running the entire execution loop there. You save the cost of diving in and out of that lower-level mode for each batch entry.

  • Is there a way to define the batch size

You've defined it implicitly here by pushing 50,000 argument sets in before executing the batch via Statement#executeBatch(). A batch size of one is just as valid.

  • Is there any better way to speed up the process of bulk insertion?

Consider opening a transaction explicitly before the batch insertion, and commit it afterward. Don't let either the database or the JDBC driver impose a transaction boundary around each insertion step in the batch. You can control the JDBC layer with the Connection#setAutoCommit(boolean) method. Take the connection out of auto-commit mode first, then populate your batches, start a transaction, execute the batch, then commit the transaction via Connection#commit().

This advice assumes that your insertions won't be contending with concurrent writers, and assumes that these transaction boundaries will give you sufficiently consistent values read from your source tables for use in the insertions. If that's not the case, favor correctness over speed.

  • Is it better to use a updatable ResultSet or PreparedStatement with batch execution?

Nothing beats testing with your JDBC driver of choice, but I expect the latter—PreparedStatement and Statement#executeBatch() will win out here. The statement handle may have an associated list or array of "batch arguments," with each entry being the argument set provided in between calls to Statement#executeBatch() and Statement#addBatch() (or Statement#clearBatch()). The list will grow with each call to addBatch(), and not be flushed until you call executeBatch(). Hence, the Statement instance is really acting as an argument buffer; you're trading memory for convenience (using the Statement instance in lieu of your own external argument set buffer).

Again, you should consider these answers general and speculative so long as we're not discussing a specific JDBC driver. Each driver varies in sophistication, and each will vary in which optimizations it pursues.

like image 127
seh Avatar answered Sep 27 '22 18:09

seh