Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom equality in Clojure distinct

Tags:

clojure

In a Clojure program I've an array composed by maps containing peoples' names and emails.

e.g.

[
    { :name "John" :email "[email protected]" }  
    { :name "Batman" :email "[email protected]" }  
    { :name "John Doe" :email "[email protected]" }  
 ] 

I'd like to remove the duplicate entries considering, for comparison purposes, pairs having the same e-mail to be equals. In the example above the output would be:

[
    { :name "John" :email "[email protected]" }  
    { :name "Batman" :email "[email protected]" }  
 ] 

What's the best way to achieve this in Clojure? Is there a way to let distinct knows what equals function to use?

Thanks.

like image 904
Ricardo Mayerhofer Avatar asked Oct 07 '15 17:10

Ricardo Mayerhofer


2 Answers

yet another way to do it, kinda more idiomatic, i guess:

(let [items [{ :name "John" :email "[email protected]" }  
             { :name "Batman" :email "[email protected]" }  
             { :name "John Doe" :email "[email protected]" }]]
  (map first (vals (group-by :email items))))

output:

({:name "John", :email "[email protected]"} 
 {:name "Batman", :email "[email protected]"})

that is how it works:

(group-by :email items) makes a map, whose keys are emails, and values are groups of records with this email

{"[email protected]" [{:name "John", :email "[email protected]"} 
                   {:name "John Doe", :email "[email protected]"}], 
 "[email protected]" [{:name "Batman", :email "[email protected]"}]}

then you just need to take its vals (groups of records) and select firsts from them.

And another way is to create a sorted set by email, so it will treat all the records with equal emails as equal records:

(let [items [{ :name "John" :email "[email protected]" }  
             { :name "Batman" :email "[email protected]" }  
             { :name "John Doe" :email "[email protected]" }]]
  (into (sorted-set-by #(compare (:email %1) (:email %2))) items))

output:

#{{:name "Batman", :email "[email protected]"} 
  {:name "John", :email "[email protected]"}}

don't really know which of them is more idiomatic and has a better performance. But i bet on the first one.

like image 117
leetwinski Avatar answered Sep 20 '22 18:09

leetwinski


This would do it: https://crossclj.info/fun/medley.core/distinct-by.html.

The function in the link goes through every value lazily and stores everything it's seen. If the value in the coll is already seen, it does not add it.

You could then call this as: (distinct-by #(% :email) maps), where maps is your vector of people-maps.

like image 40
Heman Gandhi Avatar answered Sep 21 '22 18:09

Heman Gandhi