Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TStringList.LoadFromFile - Exceptions with Large Text Files

I'm running Delphi RAD Studio XE2.

I have some very large files, each containing a large number of lines. The lines themselves are small - just 3 tab separated doubles. I want to load a file into a TStringList using TStringList.LoadFromFile but this raises an exception with large files.

For files of 2 million lines (approximately 1GB) I get the EIntOverflow exception. For larger files (20 million lines and approximately 10GB, for example) I get the ERangeCheck exception.

I have 32GB of RAM to play with and am just trying to load this file and use it quickly. What's going on here and what other options do I have? Could I use a file stream with a large buffer to load this file into a TStringList? If so could you please provide an example.

like image 565
Trojanian Avatar asked Nov 19 '14 02:11

Trojanian


People also ask

How do I load a large file into a TStringlist?

If you want to load the content of a large file into a TStringList, you are better off using TStreamReader instead of LoadFromFile (). TStreamReader uses a buffered file I/O approach to read the file in small chunks. Simply call its ReadLine () method in a loop, Add () 'ing each line to the TStringList.

What is the use of TStringlist?

TStringList is a utility class type. It is extremely useful for many kinds of list processing. Items in a string list may be inserted, moved and sorted. The list can be built string by string, or loaded from a comma separated big string, or even from a text file.

How to add file data to a list string by string?

LoadFromFile will open the file and add the file data to the list string by string. program StrListFile; {$mode objfpc} uses Classes, SysUtils; var Str: TStringList; begin Str := TStringList.Create; try Str.LoadFromFile('SomeFile.txt'); Str.Add('Hello'); Str.SaveToFile('SomeFile.txt'); finally Str.Free; end; end.

How do I load a list from a stream?

LoadFromFile simply creates a file stream with the given filename, and then executes TStrings.LoadfromStream; after that the file stream object is destroyed again. Load the contents of a stream as a series of strings. Save the contents of the list to a file.


1 Answers

When Delphi switched to Unicode in Delphi 2009, the TStrings.LoadFromStream() method (which TStrings.LoadFromFile() calls internally) became very inefficient for large streams/files.

Internally, LoadFromStream() reads the entire file into memory as a TBytes, then converts that to a UnicodeString using TEncoding.GetString() (which decodes the bytes into a TCharArray, copies that into the final UnicodeString, and then frees the array), then parses the UnicodeString (while the TBytes is still in memory) adding substrings into the list as needed.

So, just prior to LoadFromStream() exiting, there are four copies of the file data in memory - three copies taking up at worse filesize * 3 bytes of memory (where each copy is using its own contiguous memory block + some MemoryMgr overhead), and one copy for the parsed substrings! Granted, the first three copies are freed when LoadFromStream() actually exits. But this explains why you are getting memory errors before reaching that point - LoadFromStream() is trying to use 3-4 GB of memory to load a 1GB file, and the RTL's memory manger cannot handle that.

If you want to load the content of a large file into a TStringList, you are better off using TStreamReader instead of LoadFromFile(). TStreamReader uses a buffered file I/O approach to read the file in small chunks. Simply call its ReadLine() method in a loop, Add()'ing each line to the TStringList. For example:

//MyStringList.LoadFromFile(filename);
Reader := TStreamReader.Create(filename, true);
try
  MyStringList.BeginUpdate;
  try
    MyStringList.Clear;
    while not Reader.EndOfStream do
      MyStringList.Add(Reader.ReadLine);
  finally
    MyStringList.EndUpdate;
  end;
finally
  Reader.Free;
end;

Maybe some day, LoadFromStream() might be re-written to use TStreamReader internally like this.

like image 118
Remy Lebeau Avatar answered Nov 15 '22 06:11

Remy Lebeau