I am working on a desktop application which uses SQLite to bulk insert tens of thousands of rows into a SQLite database. I would like help optimizing the bulk insert performance. It currently takes up to 50 seconds to insert 60 megs worth of data into the database.
what connection string paramaters could I use to improve performance? Should I change the buffer size? Is this possible via a connection string parameter? Are there any other connection string parameters to improve performance? My current connection string is:
Data Source=Batch.db;Version=3;Pooling=True;Max Pool Size=10;Synchronous=off;FailIfMissing=True;Journal Mode=Off;
I am using Dapper ORM. (built by the guys at StackOverflow) Is there a faster way to bulk insert into Sqlite, in .net?
System.Data.Sqlite is being used to insert into SQLite. What about getting a special compiled version of sqlite which improves performance? Is one version of SQLite better than another? Currently using System.Data.SQLite from http://sqlite.phxsoftware.com
Currently, I am wrapping inserts inside a transaction to make them faster (this made a good improvement).
I am inserting into one table at a time into 17 tables. Could I parallelize this on different threads and make this faster?
Current Performance. Is this typical? Can I do better?
I like SQLite, but I would love to make it a bit faster. Currently saving my objects to an XML file using XML serialization is faster than saving to a SQLite database, so my boss is asking: why switch to SQLite? Or should I be using MongoDB, or some other object database?
So I finally found the trick to high performance bulk inserts in SQLite using .NET. This trick improved insert performance by a factor of 4.1! My total save time went from 27 seconds to 6.6 seconds. wow!
This article explains the fastest way to do bulk inserts into SQLite. The key is reusing the same parameter objects but for each record to insert, assigning a different value. The time that .NET takes constructing all those DbParameter objects really adds up. For example with 100k rows and 30 columns = 3 million parameter objects which must be created. Instead, creating and reusing just 30 parameter objects is much faster.
New performance:
55,000 rows (19 columns) in .53 seconds = 100k inserts/second
internal const string PeakResultsInsert = @"INSERT INTO PeakResult values(@Id,@PeakID,@QuanPeakID,@ISTDRetentionTimeDiff)";
var command = cnn.CreateCommand();
command.CommandText = BatchConstants.PeakResultsInsert;
string[] parameterNames = new[]
{
"@Id",
"@PeakID",
"@QuanPeakID",
"@ISTDRetentionTimeDiff"
};
DbParameter[] parameters = parameterNames.Select(pn =>
{
DbParameter parameter = command.CreateParameter();
parameter.ParameterName = pn;
command.Parameters.Add(parameter);
return parameter;
}).ToArray();
foreach (var peakResult in peakResults)
{
parameters[0].Value = peakResult.Id;
parameters[1].Value = peakResult.PeakID;
parameters[2].Value = peakResult.QuanPeakID;
parameters[3].Value = peakResult.ISTDRetentionTimeDiff;
command.ExecuteNonQuery();
}
It ends up that I could not use Dapper for inserting into my large tables. (For my small tables, I still use Dapper).
Note, some other things that I found:
I tried using multiple threads to insert data into the same database, this did not make any improvement. (didn't make a difference)
Upgraded from System.Data.Sqlite 1.0.69 to 1.0.79. (didn't make a difference in performance that I could see)
I am not assigning a Type to the DbParameter, it doesn't seem to make a performance difference either way.
For reads, I could not improve on Dapper's performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With