Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TClientDataSet uses too much memory for string fields

I was triggered to ask this question when trying to support this question with an MCVE.

I recently started noticing that TClientDataSet quickly runs out of memory. I had an issue in production where it couldn't load a dataset with about 60.000, which seemed surprisingly low to me. The client dataset was connected through a provider with an ADODataSet, which loaded fine. I ran that query separately and outputted the result to CSV, which gave me a file of < 30MB.

So I made a small test, where I can load up to about 165K records in the client dataset, which has a string field with a size of 4000. The actual value of the field is only 3 characters, but that doesn't seem to matter for the result.

It looks like each record takes up at least those 4000 characters. 4000 x 2 bytes x 165K records = 1.3GB, so that starts closing in to the 32 bit memory limit. If I turn it into a memo field, I can easily add 5 million rows.

program ClientDataSetTest;
{$APPTYPE CONSOLE}
uses SysUtils, DB, DBClient;

var
  c: TClientDataSet;
  i: Integer;
begin
  c := TClientDataSet.Create(nil);
  c.FieldDefs.Add('Id', ftInteger);
  c.FieldDefs.Add('Test', ftString, 4000); // Actually claims this much space...
  //c.FieldDefs.Add('Test', ftMemo); // Way more space efficient (and not notably slower)
  //c.FieldDefs.Add('Test', ftMemo, 1); // But specifying size doesn't have any effect.
  c.CreateDataSet;

  try
    i := 0;
    while i < 5000000 do
    begin
      c.Append;
      c['Id'] := i;
      c['Test'] := 'xyz';
      c.Post;

      if (i mod 1000) = 0 then
        WriteLn(i, c['Test']);

      Inc(i);
    end;

  except
    on e: Exception do
    begin
      c.Cancel;
      WriteLn('Error adding row', i);
      Writeln(e.ClassName, ': ', e.Message);
    end;
  end;

  c.SaveToFile('c:\temp\output.xml', dfXML);
  Writeln('Press ''any'' key');
  ReadLn;
end.

So the question(s) themselves are a bit broad, but I'd like to have a solution for this and be able to load larger data sets by using the string space a bit more efficient. The reason the field is large, is because they can contain an annotation. For most records those will be empty or short though, so it's a tremendous waste of space.

  • Can TClientDataSet be configured in such a way that it handles this differently? I browsed its properties, but I can't find anything that seems related to this.
  • Can it be solved by using a different field type? I though of ftMemo, but that has some other disadvantages, like the size not being used for truncation, and some display issues, like TDBGrid displaying it as (MEMO), instead of the actual value.
  • Are there drop-in replacements for TClientDataSet that solve this? It's not just about the in-memory part, but also about the communication with ADO components through a TProvider, which is the main way I use it in this project, so not any memory dataset would do the trick.

For that last point, I happened to find this question, where hidden away in comments, vgLib is mentioned, but all I find about that is broken links, and I don't even know if it would solve this issue. Apparently the C++ code for MidasLib is available now, but since it's 1.5MB of obscure code, I thought it might be worth asking here before I dive into that. ;)

like image 456
GolezTrol Avatar asked Mar 27 '19 11:03

GolezTrol


People also ask

Is it possible to use tclientdataset as an in-memory dataset?

Bookmark this question. Show activity on this post. According to this page, it's possible to use TClientDataset as an in-memory dataset, completely independent of any actual databases or files. It describes how to setup the dataset's table structure and how to load data into it at runtime.

What is Delphi tclientdataset?

Delphi offers a native solution: The TClientDataSet component -- located on the "Data Access" tab of the component palette -- represents an in-memory database-independent dataset.

What happens when I clone a clientdataset's cursor?

When you clone a ClientDataSet's cursor, you create not only an additional pointer to a shared memory store but also an independent view of the data. This article shows you how to use this important capability

Why use client datasets?

Whether you use client datasets for file-based data, caching updates, data from an external provider (such as working with an XML document or in a multi-tiered application), or a combination of these approaches in a "briefcase model" application, take advantage of the broad range of features that client datasets support.


2 Answers

There is a difference between the way that the blob fields (memo) and regular fields store and retrieve their data. Blob fields don't store data in the record buffer (see TBlobField.GetDataSize) and they use a different set of methods when storing or retrieving that data.

The size of each record is returned by the call to TField.GetDataSize. For the TStringField, this is the required string size + 1.

TCustomClientDataSet.InitBufferPointers uses this as part of it's calculation for the value of FRecBufSize which is used as the memory size to allocate for each record in TCustomClientDataSet.AllocRecordBuffer.

So, to answer your questions:

  • TClientDataSet can't be configured to do this any differently.
  • It can be solved by other field types but they would have to descend from TBlobField. The buffer size is allocated up front so the regular fields can't contain different sizes depending on their contents.
  • I am not sure about drop in replacements. Dev Express have a dxMemData but I don't know whether it runs into the same problems or if it is a drop in replacement.
like image 50
Graymatter Avatar answered Oct 18 '22 03:10

Graymatter


whenever I need rather long "string" field in CDS I tend to create memo one instead. besides aforementioned display issue (which can be addressed rather painless) there are few other restrictions so I have custom cds descendant. hyperbase (not vglib) internal string format is the same so it won't change anything in that regard. btw there are dacs (such as firedac) allowing to customize and choose target field type mapping. not sure whether ado components could be patched/enhanced to achieve similar functionality though. moreover iirc firedac dataset has the option to control internal string field layout ("inline" in-row buffer or just pointer to dynamically allocated one), but isn't 1:1 replacement for cds.

like image 3
vavan Avatar answered Oct 18 '22 02:10

vavan