Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The Fastest Way to Load an Array DML in Delphi FireDAC

I am using Delphi XE8 with FireDAC to load a large SQLite database. To do so, I'm using the Array DML execution technique to efficiently insert a large number of records at once, like this:

FDQueryAddINDI.SQL.Text := 'insert into indi values ('
  + ':indikey, :hasdata, :gedcomnames, :sex, :birthdate, :died, '
  + ':deathdate, :changed, :eventlinesneedprocessing, :eventlines, '
  + ':famc, :fams, :linkinfo, :todo, :nextreportindi, :firstancestralloop'
  + ')';
FDQueryAddINDI.Params.Bindmode := pbByNumber; {more efficient than by name }
FDQueryAddINDI.Params.ArraySize := MaxParams; { large enough to load all of them } 

NumParams := 0;
repeat
  { the code to determin IndiKey,... is not shown, but goes here }

  FDQueryAddINDI.Params[0].AsStrings[NumParams] := IndiKey;   
  FDQueryAddINDI.Params[1].AsIntegers[NumParams] := HasData;
  FDQueryAddINDI.Params[2].AsStrings[NumParams] := GedcomNames;
  FDQueryAddINDI.Params[3].AsStrings[NumParams] := Sex;
  FDQueryAddINDI.Params[4].AsStrings[NumParams] := Birthdate;
  FDQueryAddINDI.Params[5].AsIntegers[NumParams] := Died;
  FDQueryAddINDI.Params[6].AsStrings[NumParams] := Deathdate;
  FDQueryAddINDI.Params[7].AsStrings[NumParams] := Changed;
  FDQueryAddINDI.Params[8].AsIntegers[NumParams] := EventLinesNeedProcessing;
  FDQueryAddINDI.Params[9].AsStrings[NumParams] := EventLines;
  FDQueryAddINDI.Params[10].AsIntegers[NumParams] := FamC;
  FDQueryAddINDI.Params[11].AsIntegers[NumParams] := FamS;
  FDQueryAddINDI.Params[12].AsIntegers[NumParams] := Linkinfo;
  FDQueryAddINDI.Params[13].AsIntegers[NumParams] := ToDo;
  FDQueryAddINDI.Params[14].AsIntegers[NumParams] := NextReportIndi;
  FDQueryAddINDI.Params[15].AsIntegers[NumParams] := FirstAncestralLoop;
  inc(NumParams);
until done;
FDQueryAddINDI.Params.ArraySize := NumParams;  { Reset to actual number }

FDQueryAddINDI.Execute(LogoAppForm.FDQueryAddINDI.Params.ArraySize);

The actual loading of the data into the SQLite database is very fast, and I have no problem with the speed of that.

What is slowing me down is the amount of time being taken in the repeat loop to assign all the values to the parameters.

The Params are built into FireDAC and are a TCollection. I do not have access to the source code, so I can't see what the AsStrings and AsIntegers methods are actually doing.

Assigning each value to each parameter for each insert does not appear to me to be a very efficient way to load this TCollection. Is there a faster way to load this? I'm thinking maybe a way to load a whole set of parameters at once, e.g. (IndiKey, HasData, ... FirstAncestralLoop) all as one. Or maybe to load my own TCollection as efficiently as I can, and then use the TCollection's Assign method to copy my TCollection into the the FireDAC's TCollection.

So my question is what would be the fastest way to load this TCollection of parameters that FireDAC requires?


Update: I'm including some timings for Arnaud.

As stated in Using SQLite with FireDAC (see its Array DML section):

Starting with v 3.7.11, SQLite supports the INSERT command with multiple VALUES. FireDAC uses this feature to implement Array DML, when Params.BindMode = pbByNumber. Otherwise, FireDAC emulates Array DML.

I've tested inserting 33,790 records changing the arraysize (number of records to load per execute), and timed the loading time with both pbByName (for emulation) and pbByNumber (using multiple values insert).

This was the timing:

Arraysize: 1, Executes: 33,790, Timing: 1530 ms (pbByName), 1449 ms (pbByNumber)
Arraysize: 10, Executes: 3,379, Timing: 1034 ms (pbByName), 782 ms (pbByNumber)
Arraysize: 100, Executes: 338, Timing:  946 ms (pbByName), 499 ms (pbByNumber)
Arraysize: 1000, Executes: 34, Timing: 890 ms (pbByName), 259 ms (pbByNumber)
Arraysize: 10000, Executes: 4, Timing: 849 ms (pbByName), 227 ms (pbByNumber)
Arraysize: 20000, Executes: 2, Timing: 594 ms (pbByName), 172 ms (pbByNumber)
Arraysize: 50000, Executes: 1, Timing: 94 ms (pbByName), 94 ms (pbByNumber)

Now the interesting thing about these timings is that the loading of those 33,790 records into the TCollection is taking a full 93 ms every single test run. It doesn't matter whether they are being added 1 at a time or 10000 at a time, this overhead of filling the TCollection of Params is always there.

For comparison, I did a larger test with 198,522 inserts just for pbByNumber:

Arraysize: 100, Executes: 1986, Timing: 2774 ms (pbByNumber)
Arraysize: 1000, Executes: 199, Timing: 1371 ms (pbByNumber)
Arraysize: 10000, Executes: 20, Timing: 1292 ms (pbByNumber)
Arraysize: 100000, Executes: 2, Timing: 894 ms (pbByNumber)
Arraysize: 1000000, Executes: 1, Timing: 506 ms (pbByNumber)

For all cases of this test, the overhead of loading the TCollection of Params takes about 503 ms.

So the loading of the TCollection seems to be at about 400,000 records per second. This is a significant portion of the insert time, and once I start working with large databases in the millions, this added time will be quite noticeable to the user of my program.

I would like to improve this, but I have not yet found a way to speed the loading of the Params.


Update 2: I was able to get about a 10% time improvement by putting all my code between a StartTransaction and a Commit, so that all blocks will be processed at once.

But I'm still looking for some way to load the TCollection of Params much faster.


One other idea:

What might work well and could be up to 16 times faster if it were possible would be something like the ParamValues method. This assigns multiple params at once and has the added advantage of directly supplying a variant array, and avoids the need to cast values.

It would work like this:

    FDQueryAddINDI.Params.ParamValues['indikey;hasdata;gedcomnames;sex;birthdate;died;deathdate;changed;eventlinesneedprocessing;eventlines;famc;fams;linkinfo;todo;nextreportindi;firstancestralloop']
       := VarArrayOf([Indikey, 0, ' ', ' ', ' ', 0, ' ', ' ', 1, ' ', -1, -1, -1, -1, -1, -1]);

However, ParamValues will only assign to the first set of Params, i.e. where NumIndiParms = 0.

Is there a way to do this for each Index in the loop, i.e., every instance of NumIndiParms?


Bounty: I really want to speed up the loading of the Params. I am now offering a bounty for someone to help find me a way to speed up the loading of the Params array TCollection as implemented in FireDAC.

like image 688
lkessler Avatar asked Jul 20 '15 17:07

lkessler


2 Answers

Sounds a bit like premature optimization to me. IMHO a profiler would show that the repeat .... until done loop takes much less time than the Execute call itself. Assigning an integer is almost instant, just like assigning a string, thanks to the CopyOnWrite paradigm of Delphi string type, which copies the text by reference.

Note that in practice, there is no array DML feature in SQLite3. FireDac emulates array DML by creating multiple insertion, i.e. executing

insert into indi values (?,?,?,....),(?,?,?,....),(?,?,?,....),....,(?,?,?,....);

AFAIK this is the fastest way of inserting data using SQLite3. At least until the upcoming OTA feature is available.

Also ensure that you nest your insert within several transactions, and that the number of parameters set at once are not too high. From my tests, you should also create several transactions, if you have a lot of rows to insert. Maintaining a single transaction slows the process down. 10000 rows per transaction is a good number, from experiment.

BTW, our ORM is able to do all this low-level plumbing on its own, depending on the backend engine it runs on.

Update: Sounds like if the FireDac parameters may be in your case a real bottleneck. You should therefore bypass FireDAC, and bind directly your TCollection content with the SQlite3 engine. Try e.g. our SynSQLite3.pas unit. Remember to prepare your INSERT statement, using a multiple insertion ((?,?,?,....),(?,?,?,....),....), then bind directly your values. BTW DB.pas may be a real bottleneck, this is why our whole ORM by-pass this layer (but may use it if needed).

Update2: Since you asked for it, here is a version using mORMot.

First you define your record:

type
  TSQLIndy = class(TSQLRecord)
...
  published
    property indikey: string read findikey write findikey;
    property hasdata: boolean read fhasdata write fhasdata;
    property gedcomnames: string read fgedcomnames write fgedcomnames;
    property sex: string read fsex write fsex;
    property birthdate: string read fbirthdate write fbirthdate;
    property died: boolean read fdied write fdied;
...
  end;

Then you run the insertion via the ORM:

db := TSQLRestServerDB.CreateWithOwnModel([TSQLIndy],'test.db3');
db.CreateMissingTables; // will CREATE TABLE if not existing
batch := TSQLRestBatch.Create(db,TSQLIndy,10000);
try
  indy := TSQLIndy.Create;
  try
    for i := 1 to COUNT do begin
      indy.indikey := IntToString(i);
      indy.hasdata := i and 1=0;
      ...
      batch.Add(indy,true);
    end;
  finally
    indy.Free;
  end;
  db.BatchSend(batch);

The full source code is available online on paste.ee.

Here are the timing for 1,000,000 records:

Prepared 1000000 rows in 874.54ms
Inserted 1000000 rows in 5.79s

If I calculate well, it is more than 170,000 rows per seconds for insertion. Here, the ORM is not an overhead, it is an advantage. All the multi INSERT work, transactions (every 10000 rows), marshaling would be done by the framework. The TSQLRestBatch would store all the content as JSON in memory, then compute the SQL at once. I'm curious how direct FireDAC performs in comparison. And you would be able to switch to other database, if needed - another RDBMS (MySQL, Oracle, MSSQL, FireBird) or even MongoDB. By just adding a new line.

Hope it helps!

like image 167
Arnaud Bouchez Avatar answered Nov 06 '22 23:11

Arnaud Bouchez


The best improvement I can find is to replace the AsString and AsInteger calls with Values calls. This prevents the assignment of the datatype (string or integer) to each item and saves about 10% of the overhead.

So 93 ms in the small test is down to 83 ms. And 503 ms in the big test is down to 456 ms.

FDQueryAddINDI.Params[0].Values[NumParams] := IndiKey;   
FDQueryAddINDI.Params[1].Values[NumParams] := HasData;
FDQueryAddINDI.Params[2].Values[NumParams] := GedcomNames;
FDQueryAddINDI.Params[3].Values[NumParams] := Sex;
FDQueryAddINDI.Params[4].Values[NumParams] := Birthdate;
FDQueryAddINDI.Params[5].Values[NumParams] := Died;
FDQueryAddINDI.Params[6].Values[NumParams] := Deathdate;
FDQueryAddINDI.Params[7].Values[NumParams] := Changed;
FDQueryAddINDI.Params[8].Values[NumParams] := EventLinesNeedProcessing;
FDQueryAddINDI.Params[9].Values[NumParams] := EventLines;
FDQueryAddINDI.Params[10].Values[NumParams] := FamC;
FDQueryAddINDI.Params[11].Values[NumParams] := FamS;
FDQueryAddINDI.Params[12].Values[NumParams] := Linkinfo;
FDQueryAddINDI.Params[13].Values[NumParams] := ToDo;
FDQueryAddINDI.Params[14].Values[NumParams] := NextReportIndi;
FDQueryAddINDI.Params[15].Values[NumParams] := FirstAncestralLoop;

The type can optionally be set initially when the file is opened. Maximum string lengths can be set as well. This does not have any effect on the time, and setting the lengths does not reduce the memory used. The types and lengths are set in this manner:

FDQueryAddINDI.Params[0].DataType := ftString;
FDQueryAddINDI.Params[1].DataType := ftInteger;
FDQueryAddINDI.Params[2].DataType := ftString;
FDQueryAddINDI.Params[3].DataType := ftString;
FDQueryAddINDI.Params[4].DataType := ftString;
FDQueryAddINDI.Params[5].DataType := ftInteger;
FDQueryAddINDI.Params[6].DataType := ftString;
FDQueryAddINDI.Params[7].DataType := ftString;
FDQueryAddINDI.Params[8].DataType := ftInteger;
FDQueryAddINDI.Params[9].DataType := ftString;
FDQueryAddINDI.Params[10].DataType := ftInteger;
FDQueryAddINDI.Params[11].DataType := ftInteger;
FDQueryAddINDI.Params[12].DataType := ftInteger;
FDQueryAddINDI.Params[13].DataType := ftInteger;
FDQueryAddINDI.Params[14].DataType := ftInteger;
FDQueryAddINDI.Params[15].DataType := ftInteger;
FDQueryAddINDI.Params[0].Size := 20;
FDQueryAddINDI.Params[2].Size := 1;
FDQueryAddINDI.Params[3].Size := 1;
FDQueryAddINDI.Params[4].Size := 1;
FDQueryAddINDI.Params[6].Size := 1;
FDQueryAddINDI.Params[7].Size := 1;
FDQueryAddINDI.Params[9].Size := 1;
like image 1
lkessler Avatar answered Nov 07 '22 00:11

lkessler