Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Derby INSERTS are slow

I am using Apache Derby to store a large number of rows in the order of 10s of Millions. Each time I initiate a batch insert, I will be inserting upto 2 Millions more rows into the table. The table has a UUID as its primary key and a single contraint to a UUID in one other table. The insert takes hours !!! Why ? I have created INDEXs on all the tables - but I have since removed this as I believe Derby automatically creates an INDEX for each table with a primary key. I am using batch update with a prepared statement as shown (in very simple form below)

final PreparedStatement addStatement = connection.prepareStatement(...)
int entryCount = 0;
  for (final T entry : entries) {
    addStatement.addBatch();
    entryCount++;
    if (entryCount % 1000 == 0) {
    addStatement.executeBatch();
    addStatement.clearBatch();
    entryCount = 0;
    }
 addStatement.close();

Here are the results

05/01/12 12:42:48 Creating 2051469 HE Peaks in DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:44:18 Progress: Written (10%) 205146/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:46:51 Progress: Written (20%) 410292/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 12:50:46 Progress: Written (30%) 615438/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE 05/01/12 12:56:46 Progress: Written (40%) 820584/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:04:29 Progress: Written (50%) 1025730/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:13:19 Progress: Written (60%) 1230876/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:22:54 Progress: Written (70%) 1436022/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:34:53 Progress: Written (80%) 1641168/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:47:02 Progress: Written (90%) 1846314/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE
05/01/12 13:58:09 Completed: Written (100%) 2051469/2051469 entries to DB Table APP.ST_HE_PEAK_TABLE - Time Taken:01:15:21

As I insert more and more rows, the process gets slower and slower (probably becuase of the INDEX). The DB model I have at the moment serves my purposes well and I am reluctant to change it. Am I doing something wrong ? ... or expecting too much ? Is there any way to improve the INSERT speed ?


Edit

I have already been tuned the DB using autocommit and others. I found that when INSERT millions of records - it does take an unreasonable amount time for my application. SELECT on this data is of course very fast.

like image 302
Andy Tudor Avatar asked Jan 05 '12 16:01

Andy Tudor


1 Answers

Have you tried turning off autocommit mode? From http://db.apache.org/derby/docs/dev/tuning/tuningderby.pdf:

Inserts can be painfully slow in autocommit mode because each commit involves an update of the log on the disk for each INSERT statement. The commit will not return until a physical disk write is executed. To speed things up:

  • Run in autocommit false mode, execute a number of inserts in one transaction, and then explicitly issue a commit.
  • If your application allows an initial load into the table, you can use the import procedures to insert data into a table. Derby will not log the individual inserts when loading into an empty table using these interfaces. See the Derby Reference Manual and the Derby Server and Administration Guide for more information on the import procedures.
like image 153
Chris Shain Avatar answered Nov 02 '22 02:11

Chris Shain