I need a very efficient way to find duplicates in an unsorted sequence. This is what I came up with, but it has a few shortcomings, namely it
module Seq =
let duplicates items =
items
|> Seq.countBy id
|> Seq.filter (snd >> ((<) 1))
|> Seq.map fst
Regardless of the shortcomings, I don't see a reason to replace this with twice the code. Is it possible to improve this with comparably concise code?
We can remove duplicate element in an array by 2 ways: using temporary array or using separate index. To remove the duplicate element from array, the array must be in sorted order. If array is not sorted, you can sort it by calling Arrays.
Duplicate elements can be found using two loops. The outer loop will iterate through the array from 0 to length of the array. The outer loop will select an element. The inner loop will be used to compare the selected element with the rest of the elements of the array.
To find duplicate elements in given array we need two loops. Run an outer loop from 0 to size . Loop structure must look like for(i=0; i<size; i++) . This loop is used to select each element of array and check next subsequent elements for duplicates using another nested loop.
A more elegant functional solution:
let duplicates xs =
Seq.scan (fun xs x -> Set.add x xs) Set.empty xs
|> Seq.zip xs
|> Seq.choose (fun (x, xs) -> if Set.contains x xs then Some x else None)
Uses scan
to accumulate sets of all elements seen so far. Then uses zip
to combine each element with the set of elements before it. Finally, uses choose
to filter out the elements that are in the set of previously-seen elements, i.e. the duplicates.
EDIT
Actually my original answer was completely wrong. Firstly, you don't want duplicates in your outputs. Secondly, you want performance.
Here is a purely functional solution that implements the algorithm you're after:
let duplicates xs =
(Map.empty, xs)
||> Seq.scan (fun xs x ->
match Map.tryFind x xs with
| None -> Map.add x false xs
| Some false -> Map.add x true xs
| Some true -> xs)
|> Seq.zip xs
|> Seq.choose (fun (x, xs) ->
match Map.tryFind x xs with
| Some false -> Some x
| None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element if it is seen having only been seen once before, i.e. the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
let e = xs.GetEnumerator()
while e.MoveNext() do
let x = e.Current
let mutable seen = false
if d.TryGetValue(x, &seen) then
if not seen then
d.[x] <- true
yield x
else
d.[x] <- false }
This is around 2× faster than any of your other answers (at the time of writing).
Using a for x in xs do
loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator
directly but generating your own Enumerator
is not significantly faster than using a computation expression with yield
.
Note that the TryGetValue
member of Dictionary
allows me to avoid allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue
extension member offered by F# (and used by kvb in his/her answer) allocates its return tuple.
Here's an imperative solution (which is admittedly slightly longer):
let duplicates items =
seq {
let d = System.Collections.Generic.Dictionary()
for i in items do
match d.TryGetValue(i) with
| false,_ -> d.[i] <- false // first observance
| true,false -> d.[i] <- true; yield i // second observance
| true,true -> () // already seen at least twice
}
This is the best "functional" solution I could come up with that doesn't consume the entire sequence up front.
let duplicates =
Seq.scan (fun (out, yielded:Set<_>, seen:Set<_>) item ->
if yielded.Contains item then
(None, yielded, seen)
else
if seen.Contains item then
(Some(item), yielded.Add item, seen.Remove item)
else
(None, yielded, seen.Add item)
) (None, Set.empty, Set.empty)
>> Seq.Choose (fun (x,_,_) -> x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With