I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java's .equals(..), with nil/null equivalent to "not present".
i.e. I am looking for the most efficient way to a write a function like:
(map-difference
{:a 1, :b nil, :c 2, :d 3}
{:a 1, :b "Hidden", :c 3, :e 5})
=> {:b nil, :c 2, :d 3, :e nil}
I'd prefer an immutable Clojure map as output, but a Java map would also be fine if the performance improvement would be significant.
For what it's worth, my basic test case / expectation of behaviour is that the following will be equal (up to the equivalence of null = "Not present") for any two maps a and b:
a
(merge b (difference a b))
What would be the best way to implement this?
If we want to compare hashmaps by keys i.e. two hashmaps will be equals if they have exactly same set of keys, we can use HashMap. keySet() function. It returns all the map keys in HashSet. We can compare the hashset of keys for both maps using Set.
A collection similar to a Map, but which may associate multiple values with a single key. If you call put(K, V) twice, with the same key but different values, the multimap contains mappings from the key to both values.
I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:
The basic expectation of behaviour from the question text is impossible: if a
and b
are maps such that b
contains at least one key not present in a
, (merge b <sth>)
cannot be equal to a
.
If you end up going with an interop solution but then need to go back to a PersistentHashMap
at some point, there's always
(clojure.lang.PersistentHashMap/create
(doto (java.util.HashMap.)
(.put :foo 1)
(.put :bar 2)))
; => {:foo 1 :bar 2}
If you need to pass the keyset of a Clojure map to a Java method, you can use
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
If all keys involved are guaranteed to be Comparable
, this could be exploited for efficient computation of difference
on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.
It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)
(defn map-difference [m1 m2]
(loop [m (transient {})
ks (concat (keys m1) (keys m2))]
(if-let [k (first ks)]
(let [e1 (find m1 k)
e2 (find m2 k)]
(cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc! m k (e1 1)) (next ks))
(not e1) (recur (assoc! m k (e2 1)) (next ks))
(not e2) (recur (assoc! m k (e1 1)) (next ks))
:else (recur m (next ks))))
(persistent! m))))
I think that just doing (concat (keys m1) (keys m2))
and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.
To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)
(defn map-difference [m1 m2]
(let [ks1 (set (keys m1))
ks2 (set (keys m2))
ks1-ks2 (set/difference ks1 ks2)
ks2-ks1 (set/difference ks2 ks1)
ks1*ks2 (set/intersection ks1 ks2)]
(merge (select-keys m1 ks1-ks2)
(select-keys m2 ks2-ks1)
(select-keys m1
(remove (fn [k] (= (m1 k) (m2 k)))
ks1*ks2)))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With