I know how to remove duplicate strings from a TStringList using dupignore for a sorted Tstringlist.
CallData := TStringList.Create;
CallData.Sorted := True;
Call.Duplicates := dupIgnore;
But in my case strings must not be sorted .
Using a FOR loop finding duplicates is very slow (also using indexOF())when TStringList has hundreds of thousands of lines .
 if OpenDialog1.Execute then
  begin
    Try
      y := TStringList.create;
      f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
      while not f.EndOfStream do
      begin
        l := f.ReadLine;
        X.Add(l);
      end;
      g := Tstreamwriter.create('d:\logX.txt', True, TEncoding.UTF8);
      for I := 0 to X.count - 1 do
      begin
          if y.IndexOf(X[I]) = -1 then
          y.Add(X[I]);
      end;
      for j := 0 to y.count - 1 do
        g.WriteLine(y[j]);
    Finally
      f.free;
      y.free;
      g.free;
    End;
  end;
is there any better way ?
Here's how I would approach this problem:
If there are a large number of duplicates to be removed then the performance of the above will be affected by repeated removal from the string list. That's because each item to be removed results in the later items being shifted down one index. You can avoid this by copying into a new list rather than deleting inplace.
Alternatively, you can operate in place like this:
Count to zero. Count of the list, and then increment Count. Count elements. The point of the dictionary is that lookup is an O(1) operation and so the second algorithm has O(n) time complexity.
I would use trickery, by having a sorted and an unsorted list. Like this:
  y := TStringList.create;
  s := TStringList.create;
  s.Sorted := TRUE;
  s.Duplicates := dupIgnore;
  f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
  while not f.EndOfStream do
  begin
    l := f.ReadLine;
    s.Add(l);
    if s.Count > y.Count then y.Add(l);
  end;
  // etc.
                        function compareobjects
          (list     : Tstringlist;
           index1   : integer;
           index2   : integer
          )         : integer;
begin
  if index1 = index2 then
    result := 0
  else
    if integer(list.objects[index1]) < integer(list.objects[index2]) then
      result := -1
    else
      result := 1;
end;
begin
  Try
    y := TStringList.create;
    y.Sorted := true;
    y.Duplicates := dupignore;
    f := TStreamReader.create('c:\106x\q47780823.bat');
    i := 0;
    while not f.EndOfStream do
    begin
      inc(i);
      line := f.readline;
      y.Addobject(line,tobject(i));
    end;
    y.Sorted := false;
    y.CustomSort(compareobjects);
    for i := 0 to y.count - 1 do
      WriteLn(y[i]);
    Finally
      f.free;
      y.free;
  End;
  readln;
end.
I'd keep track of the line number (i) and assign it with the string by casting as an object; sort the list and remove duplicates as before, but then un-sort it using a custom sort on the objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With