Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TStringList of objects taking up tons of memory in Delphi XE

I'm working on a simulation program.

One of the first things the program does is read in a huge file (28 mb, about 79'000 lines,), parse each line (about 150 fields), create a class for the object, and add it to a TStringList.

It also reads in another file, which adds more objects during the run. At the end, it ends up being about 85'000 objects.

I was working with Delphi 2007, and the program used a lot of memory, but it ran OK. I upgraded to Delphi XE, and migrated the program over and now it's using a LOT more memory, and it ends up running out of memory half way through the run.

So in Delphi 2007, it would end up using 1.4 gigs after reading in the initial file, which is obviously a huge amount, but in XE, it ends up using almost 1.8 gigs, which is really huge and leads to running out and getting the error

So my question is

  1. Why is it using so much memory?
  2. Why is it using so much more memory in XE than 2007?
  3. What can I do about this? I can't change how big or long the file is, and I do need to create an object for each line and to store it somewhere

Thanks

like image 789
KingOfKong Avatar asked Aug 25 '11 16:08

KingOfKong


3 Answers

Just one idea which may save memory.

You could let the data stay on the original files, then just point to them from in-memory structures.

For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.

The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.

For instance, here is how we retrieve a line of text from the UTF-8 file content:

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.

To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.

So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).

like image 138
Arnaud Bouchez Avatar answered Nov 15 '22 10:11

Arnaud Bouchez


It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.

As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.

Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:

  • Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
  • When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
  • Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
  • Return the object associated with the string.

This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.

like image 23
Mason Wheeler Avatar answered Nov 15 '22 10:11

Mason Wheeler


  • In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.

  • In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.

AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.

Then, naturally, you can choose between

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString
like image 40
Andreas Rejbrand Avatar answered Nov 15 '22 09:11

Andreas Rejbrand