Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi How to search in binary file faster?

I have a binary file (2.5 MB) and I want to find position of this sequence of bytes: CD 09 D9 F5. Then I want to write some data after this position and also overwrite old data (4 KB) with zeros.

Here is how I do it now but it is a bit slow.

ProcessFile(dataToWrite: string);
var
  fileContent: string;
  f: file of char;
  c: char;
  n, i, startIndex, endIndex: integer;
begin
  AssignFile(f, 'file.bin');
  reset(f);
  n := FileSize(f);
  while n > 0 do
  begin
    Read(f, c);
    fileContent := fileContent + c;
    dec(n);
  end;
  CloseFile(f);

  startindex := Pos(Char($CD)+Char($09)+Char($D9)+Char($F5), fileContent) + 4;
  endIndex := startIndex + 4088;

  Seek(f, startIndex);

  for i := 1 to length(dataToWrite) do
    Write(f, dataToWrite[i]);

  c := #0;
  while (i < endIndex) do
  begin
    Write(f, c); inc(i);
  end;

  CloseFile(f);
end;
like image 424
Alex P. Avatar asked Jan 13 '23 19:01

Alex P.


2 Answers

See this answer: Fast read/write from file in delphi

Some options are:

  • memory mapped files
  • TFileStream
  • blockread

To search the file buffer, see Best way to find position in the Stream where given byte sequence starts - one answer mentions the Boyer-Moore algorithm for fast detection of a byte sequence.

like image 161
mjn Avatar answered Jan 21 '23 15:01

mjn


Your code to read the entire file into a string is very wasteful. Pascal I/O uses buffering so I don't think it's the byte by byte aspect particularly. Although one big read would be better. The main problem will be the string concatenation and the extreme heap allocation demand required to concatenate the string, one character at a time.

I'd do it like this:

function LoadFileIntoString(const FileName: string): string;
var
  Stream: TFileStream;
begin
  Stream := TFileStream.Create(FileName, fmOpenRead);
  try
    SetLength(Result, Stream.Size);//one single heap allocation
    Stream.ReadBuffer(Pointer(Result)^, Length(Result));
  finally
    Stream.Free;
  end;
end;

That alone should make a big difference. When it comes to writing the file, a similar use of strings will be much faster. I've not attempted to decipher the writing part of your code. Writing the new data, and the block of zeros again should be batched up to as few separate writes as possible.

If ever you find that you need to read or write very small blocks to a file, then I offer you my buffered file streams: Buffered files (for faster disk access).

The code could be optimised further to read only a portion of the file, and search until you find the target. You may be able to avoid reading the entire file that way. However, I suspect that these changes will make enough of a difference.

like image 29
David Heffernan Avatar answered Jan 21 '23 16:01

David Heffernan