Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does scrolling through ADOTable get slower and slower?

I want to read the entire table from an MS Access file and I'm trying to do it as fast as possible. When testing a big sample I found that the loop counter increases faster when it's reading the top records comparing to last records of the table. Here's a sample code that demonstrates this:

procedure TForm1.Button1Click(Sender: TObject);
const
  MaxRecords = 40000;
  Step = 5000;
var
  I, J: Integer;
  Table: TADOTable;
  T: Cardinal;
  Ts: TCardinalDynArray;
begin
  Table := TADOTable.Create(nil);
  Table.ConnectionString :=
    'Provider=Microsoft.ACE.OLEDB.12.0;'+
    'Data Source=BigMDB.accdb;'+
    'Mode=Read|Share Deny Read|Share Deny Write;'+
    'Persist Security Info=False';
  Table.TableName := 'Table1';
  Table.Open;

  J := 0;
  SetLength(Ts, MaxRecords div Step);
  T := GetTickCount;
  for I := 1 to MaxRecords do
  begin
    Table.Next;
    if ((I mod Step) = 0) then
    begin
      T := GetTickCount - T;
      Ts[J] := T;
      Inc(J);
      T := GetTickCount;
    end;
  end;
  Table.Free;

//  Chart1.SeriesList[0].Clear;
//  for I := 0 to Length(Ts) - 1 do
//  begin
//    Chart1.SeriesList[0].Add(Ts[I]/1000, Format(
//      'Records: %s %d-%d %s Duration:%f s',
//      [#13, I * Step, (I + 1)*Step, #13, Ts[I]/1000]));
//  end;
end;

And the result on my PC: enter image description here

The table has two string fields, one double and one integer. It has no primary key nor index field. Why does it happen and how can I prevent it?

like image 591
saastn Avatar asked Nov 27 '15 21:11

saastn


3 Answers

I can reproduce your results using an AdoQuery with an MS Sql Server dataset of similar size to yours.

However, after doing a bit of line-profiling, I think I've found the answer to this, and it's slightly counter-intuitive. I'm sure everyone who does DB programming in Delphi is used to the idea that looping through a dataset tends to be much quicker if you surround the loop by calls to Disable/EnableControls. But who would bother to do that if there are no db-aware controls attached to the dataset?

Well, it turns out that in your situation, even though there are no DB-aware controls, the speed increases hugely if you use Disable/EnableControls regardless.

The reason is that TCustomADODataSet.InternalGetRecord in AdoDB.Pas contains this:

      if ControlsDisabled then
        RecordNumber := -2 else
        RecordNumber := Recordset.AbsolutePosition;

and according to my line profiler, the while not AdoQuery1.Eof do AdoQuery1.Next loop spends 98.8% of its time executing the assignment

        RecordNumber := Recordset.AbsolutePosition;

! The calculation of Recordset.AbsolutePosition is hidden, of course, on the "wrong side" of the Recordset interface, but the fact that the time to call it apparently increases the further you go into the recordset makes it reasonable imo to speculate that it's calculated by counting from the start of the recordset's data.

Of course, ControlsDisabled returns true if DisableControls has been called and not undone by a call to EnableControls. So, retest with the loop surrounded by Disable/EnableControls and hopefully you'll get a similar result to mine. It looks like you were right that the slowdown isn't related to memory allocations.

Using the following code:

procedure TForm1.btnLoopClick(Sender: TObject);
var
  I: Integer;
  T: Integer;
  Step : Integer;
begin
  Memo1.Lines.BeginUpdate;
  I := 0;
  Step := 4000;
  if cbDisableControls.Checked then
    AdoQuery1.DisableControls;
  T := GetTickCount;
{.$define UseRecordSet}
{$ifdef UseRecordSet}
  while not AdoQuery1.Recordset.Eof do begin
    AdoQuery1.Recordset.MoveNext;
    Inc(I);
    if I mod Step = 0 then begin
      T := GetTickCount - T;
      Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
      T := GetTickCount;
    end;
  end;
{$else}
  while not AdoQuery1.Eof do begin
    AdoQuery1.Next;
    Inc(I);
    if I mod Step = 0 then begin
      T := GetTickCount - T;
      Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
      T := GetTickCount;
    end;
  end;
{$endif}
  if cbDisableControls.Checked then
    AdoQuery1.EnableControls;
  Memo1.Lines.EndUpdate;
end;

I get the following results (with DisableControls not called except where noted):

Using CursorLocation = clUseClient

AdoQuery.Next   AdoQuery.RecordSet    AdoQuery.Next 
                .MoveNext             + DisableControls

4000:157            4000:16             4000:15
8000:453            8000:16             8000:15
12000:687           12000:0             12000:32
16000:969           16000:15            16000:31
20000:1250          20000:16            20000:31
24000:1500          24000:0             24000:16
28000:1703          28000:15            28000:31
32000:1891          32000:16            32000:31
36000:2187          36000:16            36000:16
40000:2438          40000:0             40000:15
44000:2703          44000:15            44000:31
48000:3203          48000:16            48000:32

=======================================

Using CursorLocation = clUseServer

AdoQuery.Next   AdoQuery.RecordSet    AdoQuery.Next 
                .MoveNext             + DisableControls

4000:1031           4000:454            4000:563
8000:1016           8000:468            8000:562
12000:1047          12000:469           12000:500
16000:1234          16000:484           16000:532
20000:1047          20000:454           20000:546
24000:1063          24000:484           24000:547
28000:984           28000:531           28000:563
32000:906           32000:485           32000:500
36000:1016          36000:531           36000:578
40000:1000          40000:547           40000:500
44000:968           44000:406           44000:562
48000:1016          48000:375           48000:547

Calling AdoQuery1.Recordset.MoveNext calls directly into the MDac/ADO layer, of course, whereas AdoQuery1.Next involves all the overhead of the standard TDataSet model. As Serge Kraikov said, changing the CursorLocation certainly makes a difference and doesn't exhibit the slowdown we noticed, though obviously it's significantly slower than using clUseClient and calling DisableControls. I suppose it depends on exactly what you're trying to do whether you can take advantage of the extra speed of using clUseClient with RecordSet.MoveNext.

like image 121
MartynA Avatar answered Nov 14 '22 06:11

MartynA


When you open a table, ADO dataset internally creates special data structures to navigate dataset forward/backward - "dataset CURSOR". During navigation, ADO stores the list of already visited records to provide bidirectional navigation.
Seems ADO cursor code uses quadratic-time O(n2) algorithm to store this list.
But there are workaround - use server-side cursor:

Table.CursorLocation := clUseServer;  

I tested your code using this fix and get linear fetch time - fetching every next chunk of records takes the same time as previous.

PS Some other data access libraries provides special "unidirectional" datasets - this datasets can traverse only forward and don't even store already traversed records - you get constant memory consumption and linear fetch time.

like image 1
Serge Kraikov Avatar answered Nov 14 '22 07:11

Serge Kraikov


DAO is native to Access and (IMHO) is typically faster. Whether or not you switch, use the GetRows method. Both DAO and ADO support it. There is no looping. You can dump the entire recordset into an array with a couple of lines of code. Air code: yourrecordset.MoveLast yourrecordset.MoveFirst yourarray = yourrecordset.GetRows(yourrecordset.RecordCount)

like image 1
AVG Avatar answered Nov 14 '22 07:11

AVG