I am using Apache Derby to store a large number of rows in the order of 10s of Millions. Each time I initiate a batch insert, I will be inserting upto 2 Millions more rows into the table. The table has a UUID as its primary key and a single contraint to a UUID in one other table. The insert takes hours !!! Why ? I have created INDEXs on all the tables - but I have since removed this as I believe Derby automatically creates an INDEX for each table with a primary key. I am using batch update with a prepared statement as shown (in very simple form below)
final PreparedStatement addStatement = connection.prepareStatement(...)
int entryCount = 0;
for (final T entry : entries) {
addStatement.addBatch();
entryCount++;
if (entryCount % 1000 == 0) {
addStatement.executeBatch();
addStatement.clearBatch();
entryCount = 0;
}
addStatement.close();
Here are the results
05/01/12 12:42:48 Creating 2051469 HE Peaks in DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:44:18 Progress: Written (10%) 205146/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:46:51 Progress: Written (20%) 410292/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:50:46 Progress: Written (30%) 615438/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE 05/01/12 12:56:46 Progress: Written (40%) 820584/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:04:29 Progress: Written (50%) 1025730/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:13:19 Progress: Written (60%) 1230876/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:22:54 Progress: Written (70%) 1436022/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:34:53 Progress: Written (80%) 1641168/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:47:02 Progress: Written (90%) 1846314/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:58:09 Completed: Written (100%) 2051469/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE - Time Taken:01:15:21
As I insert more and more rows, the process gets slower and slower (probably becuase of the INDEX). The DB model I have at the moment serves my purposes well and I am reluctant to change it. Am I doing something wrong ? ... or expecting too much ? Is there any way to improve the INSERT speed ?
Edit
I have already been tuned the DB using autocommit and others. I found that when INSERT millions of records - it does take an unreasonable amount time for my application. SELECT on this data is of course very fast.
Have you tried turning off autocommit mode? From http://db.apache.org/derby/docs/dev/tuning/tuningderby.pdf:
Inserts can be painfully slow in autocommit mode because each commit involves an update of the log on the disk for each INSERT statement. The commit will not return until a physical disk write is executed. To speed things up:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With