Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better Sequence Duplicate Remover

Tags:

clojure

I made this function to remove consecutive duplicates, but I wanted to know if there was a better or shorter way to express it using distinct or something like that.

(defn duplicates
  [s]
  (reduce 
    #(if-not (= (last %1) %2) 
      (conj %1 %2) %1) 
    [] s))
like image 452
Mauricio Estevez Avatar asked Sep 20 '25 03:09

Mauricio Estevez


2 Answers

clojure-1.7.0-alpha1 has a correct version of this function under the name dedupe.

The one you quoted returns its input sequence without consecutive duplicates. (Almost certainly) unwittingly, it also swallows all successive nil values if they begin the input sequence.

#(if-not (= (last %1) %2)
    (conj %1 %2)
    %1)

The lambda to reduce says: If the last element of the accumulator (%1) is unequal to the next input element (%2), add it to the accumulator, otherwise return the accumulator.

Because (last []) evaluates to nil it will never add nil values while the accumulator is empty. I leave fixing that as an exercise to the reader:

Make sure that duplicates returns the expected result [nil true nil] for input [nil nil true true nil].

Note: When operating with a vector, using peek performs significantly better than last.

EDIT (Since you edited your question): distinct returns each value of the input sequence only once. Unlike set it returns lazy-sequence.

A more idiomatic way to write duplicates/dedupe is the one that A. Webb posted as a comment (since it is also lazy). Otherwise, fixing the lambda to work correctly with an empty accumulator as its input and using peek instead of last would be more idiomatic.

Instead of fixing the lambda, in clojure-1.7.0-alpha1 you would use the dedupe transducer for eager evaluation, e. g.:

(into [] (dedupe) [nil nil true true nil])
-> [nil true nil] 

Or for lazy evaluation:

(dedupe [nil nil true true nil])
-> (nil true nil)
like image 124
Leon Grapenthin Avatar answered Sep 23 '25 10:09

Leon Grapenthin


The anonymous function inside the reduce conjs an element to a sequence if it's different from the last element. You could rewrite it like this:

(defn maybe-conj [s e]
  (if (= (last s) e)
     s
     (conj s e)))

Then you could rewrite compress-sequence as:

(defn compress-sequence [s]
  (reduce maybe-conj [] s))

What this will do is go through each element of a sequence, and append it to the initial empty vector only if it's different from the last currently in that vector. The output will be a vector of the initial sequence with any runs removed. Example:

(compress-sequence [1 1 1 1 1 2 3 4 5 5 5 5 6 6 6 5 6 6 7 8 9])

evaluates to

[1 2 3 4 5 6 5 6 7 8 9]

like image 41
Diego Basch Avatar answered Sep 23 '25 09:09

Diego Basch