In a Clojure program I've an array composed by maps containing peoples' names and emails.
e.g.
[
{ :name "John" :email "[email protected]" }
{ :name "Batman" :email "[email protected]" }
{ :name "John Doe" :email "[email protected]" }
]
I'd like to remove the duplicate entries considering, for comparison purposes, pairs having the same e-mail to be equals. In the example above the output would be:
[
{ :name "John" :email "[email protected]" }
{ :name "Batman" :email "[email protected]" }
]
What's the best way to achieve this in Clojure? Is there a way to let distinct knows what equals function to use?
Thanks.
yet another way to do it, kinda more idiomatic, i guess:
(let [items [{ :name "John" :email "[email protected]" }
{ :name "Batman" :email "[email protected]" }
{ :name "John Doe" :email "[email protected]" }]]
(map first (vals (group-by :email items))))
output:
({:name "John", :email "[email protected]"}
{:name "Batman", :email "[email protected]"})
that is how it works:
(group-by :email items)
makes a map, whose keys are emails, and values are groups of records with this email
{"[email protected]" [{:name "John", :email "[email protected]"}
{:name "John Doe", :email "[email protected]"}],
"[email protected]" [{:name "Batman", :email "[email protected]"}]}
then you just need to take its vals (groups of records) and select firsts from them.
And another way is to create a sorted set by email, so it will treat all the records with equal emails as equal records:
(let [items [{ :name "John" :email "[email protected]" }
{ :name "Batman" :email "[email protected]" }
{ :name "John Doe" :email "[email protected]" }]]
(into (sorted-set-by #(compare (:email %1) (:email %2))) items))
output:
#{{:name "Batman", :email "[email protected]"}
{:name "John", :email "[email protected]"}}
don't really know which of them is more idiomatic and has a better performance. But i bet on the first one.
This would do it: https://crossclj.info/fun/medley.core/distinct-by.html.
The function in the link goes through every value lazily and stores everything it's seen. If the value in the coll
is already seen, it does not add it.
You could then call this as: (distinct-by #(% :email) maps)
, where maps
is your vector of people-maps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With