So if we want a collection of unique items we can use a 'set'.
If we already have a collection of items that we want to dedupe, we could pass them to the set function, or alternatively we could use the distinct or dedupe functions.
What are the situations for using each of these (pros/cons)?
Thanks.
The differences are:
set
will create a new set collection eagerly.distinct
will create a lazy sequence with duplicates from the input collection removed. It has an advantage over set if you process big collections and lazyness might save you from eagerly evaluating the input collection (e.g. with take
)dedupe
removes consecutive duplicates from the input collection so it has a different semantics than set
and distinct
. For example it will return (1 2 3 1 2 3)
when applied to (1 1 1 2 3 3 1 1 2 2 2 3 3)
Set and lazy seq have different APIs available (e.g. disj
, get
vs nth
) and performance characteristics (e.g. O(log32 n) look up for set and O(n) for lazy seq) and they should be chosen depending on how you would like to use their results.
Additionally distinct
and dedupe
return a transducer when called without argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With