Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure set vs distinct vs dedupe?

Tags:

clojure

So if we want a collection of unique items we can use a 'set'.

If we already have a collection of items that we want to dedupe, we could pass them to the set function, or alternatively we could use the distinct or dedupe functions.

What are the situations for using each of these (pros/cons)?

Thanks.

like image 731
Integralist Avatar asked May 18 '17 07:05

Integralist


1 Answers

The differences are:

  • set will create a new set collection eagerly.
  • distinct will create a lazy sequence with duplicates from the input collection removed. It has an advantage over set if you process big collections and lazyness might save you from eagerly evaluating the input collection (e.g. with take)
  • dedupe removes consecutive duplicates from the input collection so it has a different semantics than set and distinct. For example it will return (1 2 3 1 2 3) when applied to (1 1 1 2 3 3 1 1 2 2 2 3 3)

Set and lazy seq have different APIs available (e.g. disj, get vs nth) and performance characteristics (e.g. O(log32 n) look up for set and O(n) for lazy seq) and they should be chosen depending on how you would like to use their results.

Additionally distinct and dedupe return a transducer when called without argument.

like image 158
Piotrek Bzdyl Avatar answered Sep 21 '22 11:09

Piotrek Bzdyl