Consider this test app:
function RemoveDuplicates(const Input: IEnumerable<Integer>): IEnumerable<Integer>;
begin
// How to implement this function?
end;
var
Enumerable: IEnumerable<Integer>;
UniqueEnumerable: IEnumerable<Integer>;
begin
Enumerable := TCollections.CreateList<Integer>([1, 1, 2, 3, 3, 3, 4]);
UniqueEnumerable := RemoveDuplicates(Enumerable);
UniqueEnumerable.ForEach(
procedure(const I: Integer)
begin
WriteLn(I);
end);
ReadLn;
end.
How can I implement the RemoveDuplicates
function (this is called nub
in Haskell)?
Use what is already there:
uses
Spring.Collections,
Spring.collections.Extensions;
function RemoveDuplicates(const Input: IEnumerable<Integer>): IEnumerable<Integer>;
begin
Result := TDistinctIterator<Integer>.Create(Input, nil);
end;
This supports lazy evaluation (means that Input is not getting processed before the resulting enumerable is getting processed). It is using a hashset (currently implemented as Dictionary) internally to keep track of the already found items (this happens inside the enumerator).
Why is that important? Because any operation that does a complete enumeration might cause unwanted performance impact if Input
involves other costly operations which may by far outweigh any benefits of other approaches of removing duplicates (like putting it into a list and sort that). Also an IEnumerable is not guaranteed to be finite.
If between calling this function and enumerating the result the Input
was changed that change is affecting the outcome of your enumeration whereas it would not be the case if you are not supporting lazy evaluation. If you are enumerating multiple times the outcome might be different (i.e. up-to-date) each time.
Jens's solution will work, but it has a rather slow running time of O(n2).
A better alternative if you have a long list is to
- Sort the list
- Compare every item with its successor.
This has a running time of O(n log n) for the quicksort + O(n) for the search for a total running time of O(n log n).
See the following pseudo code (don't have access to Delphi now).
function RemoveDuplicates(const Input: IEnumerable<Integer>): IEnumerable<Integer>;
var
List: IList<Integer>;
i: integer;
begin
List := TCollections.CreateList<Integer>;
List.Assign(Input); //Copy input list to output.
List.Sort;
for i:= List.Count-1 downto 1 do begin
if List[i] = List[i-1] then List.delete(i);
//if Comparer<T>.Equals(List[i], List[i-1]) then ....
end; {for i}
end;
Problems
The problem with this approach is that the output (might) have a different order from the input. This may or may not be a problem.
Benefits (or why the dictionary sucks)
If the sorting is a cheap operation this will be the fastest approach.
The use of a dictionary carries high constant cost for the hashing.
Even though the hashing operation is O(1), it can get very expensive for large keys, because the hash will always process the whole key, whereas sorting comparison will stop as soon a a difference is detected.
Further note that hashing is a much more expensive operation than a simple comparision (about 30x to 100x slower)!
Only when the list is huge does the better asymptotic running time of the dictonary kick in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With