Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way for doing INSERTS using IBATIS

I need to insert 20,000 rows in a single table (SQL Server 2005) using iBatis. What's the fastest way to do it ? I'm already using batch mode, but it didn't help much:

try {
  sqlMap.startTransaction();
  sqlMap.startBatch();
  // ... execute statements in between
  sqlMap.commitTransaction();
} finally {
  sqlMap.endTransaction();
}
like image 950
muriloq Avatar asked Jan 25 '23 00:01

muriloq


1 Answers

Barring the bulk loaders others are referring to, let's consider how to best do it through SQL. (And the bulk loaders don't work well if you're sending intermixed data to different tables.)

First, you shouldn't be using whatever abstraction layer you're using, in this case iBatis, as it effectively will offer you little value, but that abstraction layer will have some (not necessarily much, but some) CPU cost. You should really simply use a raw database connection.

Next, you'll be sending in a mess of INSERT statements. The question is whether you should use a simple string for the statment, (i.e. INSERT INTO TABLE1 VALUES('x','y', 12)) vs a prepared statement (INSERT INTO TABLE1 VALUES(?, ?, ?)).

That will depend on your database and DB drivers.

The issue with using a simple string, is basically the conversion cost from an internal format (assuming you're inserting Java data) to the string. Converting a number or date to a String is actually a reasonably expensive CPU operation. Some databases and drivers will work with the binary data directly, rather than simply the string data. So, in that case a PreparedStatement could net some CPU savings in potentially not having to convert the data.

The downside is that this factor will vary by DB vendor, and potentially even the JDBC vendor. For example, Postgres (I believe) only works with SQL strings, rather than binary, so using a PreparedStatement is a waste over simply building the string yourself.

Next, once you have your statement type, you want to use the addBatch() method of the JDBC Statement class. What addBatch does is it groups up the SQL statements in to, well, a batch. The benefit is that instead of sending several requests to the DB, you send a single LARGE request. This cuts down on network traffic, and will give some noticeable gains in throughput.

The detail is that not all drivers/databases support addBatch (at least not well), but also the size of your batch is limited. You most likely can't addBatch for all 20,000 rows and expect it to work, though that would be the best bet. This limit, also, can vary by database.

For Oracle, in the past, I used a buffer of 64K. Basically I wrote a wrapper function that would take a literal INSERT statement, and accumulate them in 64K batches.

So, if you wanted to bulk insert data through SQL via JDBC, those are the ways to do it. The big improvement is the Batch mode, the Statement vs PreparedStatement is more to potentially conserve some CPU, and maybe network traffic if your driver supports a binary protocol.

Test, rinse, and repeat until you're happy enough.

like image 139
Will Hartung Avatar answered Feb 05 '23 16:02

Will Hartung