Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi

I have a "Find Files" function in my program that will find text files with the .ged suffix that my program reads. I display the found results in an explorer-like window that looks like this:

enter image description here

I use the standard FindFirst / FindNext methods, and this works very quickly. The 584 files shown above are found and displayed within a couple of seconds.

What I'd now like to do is add two columns to the display that shows the "Source" and "Version" that are contained in each of these files. This information is found usually within the first 10 lines of each file, on lines that look like:

1 SOUR FTM
2 VERS Family Tree Maker (20.0.0.368)

Now I have no problem parsing this very quickly myself, and that is not what I'm asking.

What I need help with is simply how to most quickly load the first 10 or so lines from these files so that I can parse them.

I have tried to do a StringList.LoadFromFile, but it takes too much time loading the large files, such at those above 1 MB.

Since I only need the first 10 lines or so, how would I best get them?

I'm using Delphi 2009, and my input files might or might not be Unicode, so this needs to work for any encoding.


Followup: Thanks Antonio,

I ended up doing this which works fine:

var
  CurFileStream: TStream;
  Buffer: TBytes;
  Value: string;
  Encoding: TEncoding;

try
  CurFileStream := TFileStream.Create(folder + FileName, fmOpenRead);
  SetLength(Buffer, 256);
  CurFileStream.Read(Buffer[0], 256);
  TEncoding.GetBufferEncoding(Buffer, Encoding);
  Value := Encoding.GetString(Buffer);
  ...
  (parse through Value to get what I want)
  ...
finally
  CurFileStream.Free;
end;
like image 588
lkessler Avatar asked Jan 30 '11 20:01

lkessler


2 Answers

Use TFileStream and with Read method read number of bytes needed. Here is the example of reading bitmap info that is also stored on begining of the file.

http://www.delphidabbler.com/tips/19

like image 194
Antonio Bakula Avatar answered Nov 10 '22 14:11

Antonio Bakula


Just open the file yourself for block reading (not using TStringList builtin functionality), and read the first block of the file, and then you can for example load that block to a stringlist with strings.SetText() (if you are using block functions) or simply strings.LoadFromStream() if you are loading your blocks using streams.

I would personally just go with FileRead/FileWrite block functions, and load the block into a buffer. You could also use similair winapi functions, but that's just more code for no reason.

OS reads files in blocks, which are at least 512bytes big on almost any platform/filesystem, so you can read 512 bytes first (and hope that you got all 10 lines, which will be true if your lines are generally short enough). This will be (practically) as fast as reading 100 or 200 bytes.

Then if you notice that your strings objects has only less than 10 lines, just read next 512 byte block and try to parse again. (Or just go with 1024, 2048 and so on blocks, on many systems it will probably be as fast as 512 blocks, as filesystem cluster sizes are generally larger than 512 bytes).

PS. Also, using threads or asynchronous functionality in winapi file functions (CreateFile and such), you could load that data from files asynchronously, while the rest of your application works. Specifically, the interface will not freeze during reading of large directories.

This will make the loading of your information appear faster, (since the file list will load directly, and then some milliseconds later the rest of the information will come up), while not actually increasing the real reading speed.

Do this only if you have tried the other methods and you feel like you need the extra boost.

like image 4
Cray Avatar answered Nov 10 '22 13:11

Cray