I have some files (3-5) that i need to compare:
File 1.txt have 1 million strings.
File 2.txt have 10 million strings.
File 3.txt have 5 million strings.
All these files are compared with file keys.txt (10 thousand strings). If line from currently opened file is the same as one of lines from keys.txt, write this line into output.txt (I hope you understand what i mean).
Now i have:
function Thread.checkKeys(sLine: string): boolean;
var
SR: TStreamReader;
line: string;
begin
Result := false;
SR := TStreamReader.Create(sKeyFile); // sKeyFile - Path to file keys.txt
try
while (not(SR.EndOfStream)) and (not(Result))do
begin
line := SR.ReadLine;
if LowerCase(line) = LowerCase(sLine) then
begin
saveStr(sLine);
inc(iMatch);
Result := true;
end;
end;
finally
SR.Free;
end;
end;
procedure Thread.saveStr(sToSave: string);
var
fOut: TStreamWriter;
begin
fOut := TStreamWriter.Create('output.txt', true, TEncoding.UTF8);
try
fOut.WriteLine(sToSave);
finally
fOut.Free;
end;
end;
procedure Thread.updateFiles;
begin
fmMain.flDone.Caption := IntToStr(iFile);
fmMain.flMatch.Caption := IntToStr(iMatch);
end;
And loop with
fInput := TStreamReader.Create(tsFiles[iCurFile]);
while not(fInput.EndOfStream) do
begin
sInput := fInput.ReadLine;
checkKeys(sInput);
end;
fInput.Free;
iFile := iCurFile + 1;
Synchronize(updateFiles);
So, if i compare these 3 files with file key.txt it takes about 4 hours. How to decrease compare time?
You could try a command line diff tool or DiffUtils for Windows. Textpad also has a comparison tool integrated it the files are text. If you just need to detmine if the files are different (not what the differences are) use a checksum comparison tool that uses MD5 or SHA1.
Open any two files (A, B) in Notepad++, which you want to compare. File B (new) gets compared to File A (old). Then, navigate to Plugins > Compare Menu > Compare. It shows the difference/comparison side by side, as shown in the screenshot.
Use the diff command to compare text files. It can compare single files or the contents of directories. When the diff command is run on regular files, and when it compares text files in different directories, the diff command tells which lines must be changed in the files so that they match.
An easy solution is to use an associative container to store your keys. This can provide efficient lookup.
In Delphi you can use TDictionary<TKey,TValue>
from Generics.Collections
. The implementation of this container hashes the keys and provides O(1) lookup.
Declare the container like this:
Keys: TDictionary<string, Boolean>;
// doesn't matter what type you use for the value, we pick Boolean since we
// have to pick something
Create and populate it like this:
Keys := TDictionary<string, Integer>.Create;
SR := TStreamReader.Create(sKeyFile);
try
while not SR.EndOfStream do
Keys.Add(LowerCase(SR.ReadLine), True);
// exception raised if duplicate key found
finally
SR.Free;
end;
Then your checking function becomes:
function Thread.checkKeys(const sLine: string): boolean;
begin
Result := Keys.ContainsKey(LowerCase(sLine));
if Result then
begin
saveStr(sLine);
inc(iMatch);
end;
end;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With