Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing batch inserts, SQLite

Tags:

java

sqlite

I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.

8 minutes to store 10,000,000 seems a bit too long (or is it?)

Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.

public int flush() throws SQLException {
    String sql = "insert into datastore values(?,?,?,?);";

    PreparedStatement prep = con.prepareStatement(sql);

    for (DatastoreElement e : content) { // content is 10,000 elements long
        _KVPair kvp = e.getKvp();

        prep.setInt(1, e.getMetaHash());
        prep.setInt(2, kvp.hashCode());
        prep.setString(3, kvp.getKey());
        prep.setString(4, kvp.getValue());

        prep.addBatch();
    }

    int[] updateCounts = prep.executeBatch();

    con.commit();

    return errorsWhileInserting(updateCounts);
}

When table is created it is done via

    statement.executeUpdate("create table datastore 
               (meta_hash INTEGER," + 
               "kv_hash   INTEGER," + 
               "key TEXT," +
               "value TEXT);");

Can any of the above be further optimized please?

like image 865
James Raitsev Avatar asked Aug 23 '12 15:08

James Raitsev


2 Answers

I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit() is pointless. Do it with conn.setAutoCommit(false). Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.

EDIT: The questioner updated to say that this is already set true. In that case:

That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.

like image 78
Joe Avatar answered Oct 26 '22 05:10

Joe


You are only executing executeBatchonce, which means that all 10 million statements are send to the database in the executeBatch call. This is way too much to handle for a database. You should additionally execute int[] updateCounts = prep.executeBatch(); in your loop let's say all 1000 rows. Just make an if statement which tests on counter % 1000 == 0. Then the database can asynchronously already work on the data you sent.

like image 30
keiki Avatar answered Oct 26 '22 06:10

keiki