I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.
8 minutes to store 10,000,000 seems a bit too long (or is it?)
Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.
public int flush() throws SQLException {
String sql = "insert into datastore values(?,?,?,?);";
PreparedStatement prep = con.prepareStatement(sql);
for (DatastoreElement e : content) { // content is 10,000 elements long
_KVPair kvp = e.getKvp();
prep.setInt(1, e.getMetaHash());
prep.setInt(2, kvp.hashCode());
prep.setString(3, kvp.getKey());
prep.setString(4, kvp.getValue());
prep.addBatch();
}
int[] updateCounts = prep.executeBatch();
con.commit();
return errorsWhileInserting(updateCounts);
}
When table is created it is done via
statement.executeUpdate("create table datastore
(meta_hash INTEGER," +
"kv_hash INTEGER," +
"key TEXT," +
"value TEXT);");
Can any of the above be further optimized please?
I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit()
is pointless. Do it with conn.setAutoCommit(false)
. Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.
EDIT: The questioner updated to say that this is already set true. In that case:
That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.
You are only executing executeBatch
once, which means that all 10 million statements are send to the database in the executeBatch
call. This is way too much to handle for a database.
You should additionally execute int[] updateCounts = prep.executeBatch();
in your loop let's say all 1000 rows. Just make an if statement which tests on counter % 1000 == 0
. Then the database can asynchronously already work on the data you sent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With