I know how to remove duplicate strings from a TStringList using dupignore for a sorted Tstringlist.
CallData := TStringList.Create;
CallData.Sorted := True;
Call.Duplicates := dupIgnore;
But in my case strings must not be sorted .
Using a FOR loop finding duplicates is very slow (also using indexOF())when TStringList has hundreds of thousands of lines .
if OpenDialog1.Execute then
begin
Try
y := TStringList.create;
f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
while not f.EndOfStream do
begin
l := f.ReadLine;
X.Add(l);
end;
g := Tstreamwriter.create('d:\logX.txt', True, TEncoding.UTF8);
for I := 0 to X.count - 1 do
begin
if y.IndexOf(X[I]) = -1 then
y.Add(X[I]);
end;
for j := 0 to y.count - 1 do
g.WriteLine(y[j]);
Finally
f.free;
y.free;
g.free;
End;
end;
is there any better way ?
Here's how I would approach this problem:
If there are a large number of duplicates to be removed then the performance of the above will be affected by repeated removal from the string list. That's because each item to be removed results in the later items being shifted down one index. You can avoid this by copying into a new list rather than deleting inplace.
Alternatively, you can operate in place like this:
Count
to zero. Count
of the list, and then increment Count
. Count
elements. The point of the dictionary is that lookup is an O(1) operation and so the second algorithm has O(n) time complexity.
I would use trickery, by having a sorted and an unsorted list. Like this:
y := TStringList.create;
s := TStringList.create;
s.Sorted := TRUE;
s.Duplicates := dupIgnore;
f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
while not f.EndOfStream do
begin
l := f.ReadLine;
s.Add(l);
if s.Count > y.Count then y.Add(l);
end;
// etc.
function compareobjects
(list : Tstringlist;
index1 : integer;
index2 : integer
) : integer;
begin
if index1 = index2 then
result := 0
else
if integer(list.objects[index1]) < integer(list.objects[index2]) then
result := -1
else
result := 1;
end;
begin
Try
y := TStringList.create;
y.Sorted := true;
y.Duplicates := dupignore;
f := TStreamReader.create('c:\106x\q47780823.bat');
i := 0;
while not f.EndOfStream do
begin
inc(i);
line := f.readline;
y.Addobject(line,tobject(i));
end;
y.Sorted := false;
y.CustomSort(compareobjects);
for i := 0 to y.count - 1 do
WriteLn(y[i]);
Finally
f.free;
y.free;
End;
readln;
end.
I'd keep track of the line number (i
) and assign it with the string by casting as an object; sort the list and remove duplicates as before, but then un-sort it using a custom sort on the objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With