I want to read the entire table from an MS Access file and I'm trying to do it as fast as possible. When testing a big sample I found that the loop counter increases faster when it's reading the top records comparing to last records of the table. Here's a sample code that demonstrates this:
procedure TForm1.Button1Click(Sender: TObject);
const
MaxRecords = 40000;
Step = 5000;
var
I, J: Integer;
Table: TADOTable;
T: Cardinal;
Ts: TCardinalDynArray;
begin
Table := TADOTable.Create(nil);
Table.ConnectionString :=
'Provider=Microsoft.ACE.OLEDB.12.0;'+
'Data Source=BigMDB.accdb;'+
'Mode=Read|Share Deny Read|Share Deny Write;'+
'Persist Security Info=False';
Table.TableName := 'Table1';
Table.Open;
J := 0;
SetLength(Ts, MaxRecords div Step);
T := GetTickCount;
for I := 1 to MaxRecords do
begin
Table.Next;
if ((I mod Step) = 0) then
begin
T := GetTickCount - T;
Ts[J] := T;
Inc(J);
T := GetTickCount;
end;
end;
Table.Free;
// Chart1.SeriesList[0].Clear;
// for I := 0 to Length(Ts) - 1 do
// begin
// Chart1.SeriesList[0].Add(Ts[I]/1000, Format(
// 'Records: %s %d-%d %s Duration:%f s',
// [#13, I * Step, (I + 1)*Step, #13, Ts[I]/1000]));
// end;
end;
And the result on my PC:
The table has two string fields, one double and one integer. It has no primary key nor index field. Why does it happen and how can I prevent it?
I can reproduce your results using an AdoQuery with an MS Sql Server dataset of similar size to yours.
However, after doing a bit of line-profiling, I think I've found the answer to this, and it's slightly counter-intuitive. I'm sure everyone who does DB programming in Delphi is used to the idea that looping through a dataset tends to be much quicker if you surround the loop by calls to Disable/EnableControls. But who would bother to do that if there are no db-aware controls attached to the dataset?
Well, it turns out that in your situation, even though there are no DB-aware controls, the speed increases hugely if you use Disable/EnableControls regardless.
The reason is that TCustomADODataSet.InternalGetRecord in AdoDB.Pas contains this:
if ControlsDisabled then
RecordNumber := -2 else
RecordNumber := Recordset.AbsolutePosition;
and according to my line profiler, the while not AdoQuery1.Eof do AdoQuery1.Next loop spends 98.8% of its time executing the assignment
RecordNumber := Recordset.AbsolutePosition;
! The calculation of Recordset.AbsolutePosition is hidden, of course, on the "wrong side" of the Recordset interface, but the fact that the time to call it apparently increases the further you go into the recordset makes it reasonable imo to speculate that it's calculated by counting from the start of the recordset's data.
Of course, ControlsDisabled
returns true if DisableControls
has been called and not undone by a call to EnableControls
. So, retest with the loop surrounded by Disable/EnableControls and hopefully you'll get a similar result to mine. It looks like you were right that the slowdown isn't related to memory allocations.
Using the following code:
procedure TForm1.btnLoopClick(Sender: TObject);
var
I: Integer;
T: Integer;
Step : Integer;
begin
Memo1.Lines.BeginUpdate;
I := 0;
Step := 4000;
if cbDisableControls.Checked then
AdoQuery1.DisableControls;
T := GetTickCount;
{.$define UseRecordSet}
{$ifdef UseRecordSet}
while not AdoQuery1.Recordset.Eof do begin
AdoQuery1.Recordset.MoveNext;
Inc(I);
if I mod Step = 0 then begin
T := GetTickCount - T;
Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
T := GetTickCount;
end;
end;
{$else}
while not AdoQuery1.Eof do begin
AdoQuery1.Next;
Inc(I);
if I mod Step = 0 then begin
T := GetTickCount - T;
Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
T := GetTickCount;
end;
end;
{$endif}
if cbDisableControls.Checked then
AdoQuery1.EnableControls;
Memo1.Lines.EndUpdate;
end;
I get the following results (with DisableControls not called except where noted):
Using CursorLocation = clUseClient
AdoQuery.Next AdoQuery.RecordSet AdoQuery.Next
.MoveNext + DisableControls
4000:157 4000:16 4000:15
8000:453 8000:16 8000:15
12000:687 12000:0 12000:32
16000:969 16000:15 16000:31
20000:1250 20000:16 20000:31
24000:1500 24000:0 24000:16
28000:1703 28000:15 28000:31
32000:1891 32000:16 32000:31
36000:2187 36000:16 36000:16
40000:2438 40000:0 40000:15
44000:2703 44000:15 44000:31
48000:3203 48000:16 48000:32
=======================================
Using CursorLocation = clUseServer
AdoQuery.Next AdoQuery.RecordSet AdoQuery.Next
.MoveNext + DisableControls
4000:1031 4000:454 4000:563
8000:1016 8000:468 8000:562
12000:1047 12000:469 12000:500
16000:1234 16000:484 16000:532
20000:1047 20000:454 20000:546
24000:1063 24000:484 24000:547
28000:984 28000:531 28000:563
32000:906 32000:485 32000:500
36000:1016 36000:531 36000:578
40000:1000 40000:547 40000:500
44000:968 44000:406 44000:562
48000:1016 48000:375 48000:547
Calling AdoQuery1.Recordset.MoveNext
calls directly into the MDac/ADO layer, of
course, whereas AdoQuery1.Next involves all the overhead of the standard TDataSet
model. As Serge Kraikov said, changing the CursorLocation certainly makes a difference and doesn't exhibit the slowdown we noticed, though obviously it's significantly slower than using clUseClient and calling DisableControls. I suppose it depends on exactly what you're trying to do whether you can take advantage of the extra speed of using clUseClient with RecordSet.MoveNext.
When you open a table, ADO dataset internally creates special data structures to navigate dataset forward/backward - "dataset CURSOR". During navigation, ADO stores the list of already visited records to provide bidirectional navigation.
Seems ADO cursor code uses quadratic-time O(n2) algorithm to store this list.
But there are workaround - use server-side cursor:
Table.CursorLocation := clUseServer;
I tested your code using this fix and get linear fetch time - fetching every next chunk of records takes the same time as previous.
PS Some other data access libraries provides special "unidirectional" datasets - this datasets can traverse only forward and don't even store already traversed records - you get constant memory consumption and linear fetch time.
DAO is native to Access and (IMHO) is typically faster.
Whether or not you switch, use the GetRows method. Both DAO and ADO support it.
There is no looping. You can dump the entire recordset into an array with a couple of lines of code. Air code:
yourrecordset.MoveLast
yourrecordset.MoveFirst
yourarray = yourrecordset.GetRows(yourrecordset.RecordCount)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With