I was triggered to ask this question when trying to support this question with an MCVE.
I recently started noticing that TClientDataSet quickly runs out of memory. I had an issue in production where it couldn't load a dataset with about 60.000, which seemed surprisingly low to me. The client dataset was connected through a provider with an ADODataSet, which loaded fine. I ran that query separately and outputted the result to CSV, which gave me a file of < 30MB.
So I made a small test, where I can load up to about 165K records in the client dataset, which has a string field with a size of 4000. The actual value of the field is only 3 characters, but that doesn't seem to matter for the result.
It looks like each record takes up at least those 4000 characters. 4000 x 2 bytes x 165K records = 1.3GB, so that starts closing in to the 32 bit memory limit. If I turn it into a memo field, I can easily add 5 million rows.
program ClientDataSetTest;
{$APPTYPE CONSOLE}
uses SysUtils, DB, DBClient;
var
c: TClientDataSet;
i: Integer;
begin
c := TClientDataSet.Create(nil);
c.FieldDefs.Add('Id', ftInteger);
c.FieldDefs.Add('Test', ftString, 4000); // Actually claims this much space...
//c.FieldDefs.Add('Test', ftMemo); // Way more space efficient (and not notably slower)
//c.FieldDefs.Add('Test', ftMemo, 1); // But specifying size doesn't have any effect.
c.CreateDataSet;
try
i := 0;
while i < 5000000 do
begin
c.Append;
c['Id'] := i;
c['Test'] := 'xyz';
c.Post;
if (i mod 1000) = 0 then
WriteLn(i, c['Test']);
Inc(i);
end;
except
on e: Exception do
begin
c.Cancel;
WriteLn('Error adding row', i);
Writeln(e.ClassName, ': ', e.Message);
end;
end;
c.SaveToFile('c:\temp\output.xml', dfXML);
Writeln('Press ''any'' key');
ReadLn;
end.
So the question(s) themselves are a bit broad, but I'd like to have a solution for this and be able to load larger data sets by using the string space a bit more efficient. The reason the field is large, is because they can contain an annotation. For most records those will be empty or short though, so it's a tremendous waste of space.
For that last point, I happened to find this question, where hidden away in comments, vgLib is mentioned, but all I find about that is broken links, and I don't even know if it would solve this issue. Apparently the C++ code for MidasLib is available now, but since it's 1.5MB of obscure code, I thought it might be worth asking here before I dive into that. ;)
Bookmark this question. Show activity on this post. According to this page, it's possible to use TClientDataset as an in-memory dataset, completely independent of any actual databases or files. It describes how to setup the dataset's table structure and how to load data into it at runtime.
Delphi offers a native solution: The TClientDataSet component -- located on the "Data Access" tab of the component palette -- represents an in-memory database-independent dataset.
When you clone a ClientDataSet's cursor, you create not only an additional pointer to a shared memory store but also an independent view of the data. This article shows you how to use this important capability
Whether you use client datasets for file-based data, caching updates, data from an external provider (such as working with an XML document or in a multi-tiered application), or a combination of these approaches in a "briefcase model" application, take advantage of the broad range of features that client datasets support.
There is a difference between the way that the blob fields (memo) and regular fields store and retrieve their data. Blob fields don't store data in the record buffer (see TBlobField.GetDataSize
) and they use a different set of methods when storing or retrieving that data.
The size of each record is returned by the call to TField.GetDataSize
. For the TStringField
, this is the required string size + 1.
TCustomClientDataSet.InitBufferPointers
uses this as part of it's calculation for the value of FRecBufSize
which is used as the memory size to allocate for each record in TCustomClientDataSet.AllocRecordBuffer
.
So, to answer your questions:
whenever I need rather long "string" field in CDS I tend to create memo one instead. besides aforementioned display issue (which can be addressed rather painless) there are few other restrictions so I have custom cds descendant. hyperbase (not vglib) internal string format is the same so it won't change anything in that regard. btw there are dacs (such as firedac) allowing to customize and choose target field type mapping. not sure whether ado components could be patched/enhanced to achieve similar functionality though. moreover iirc firedac dataset has the option to control internal string field layout ("inline" in-row buffer or just pointer to dynamically allocated one), but isn't 1:1 replacement for cds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With