Difference between two maps

Tags:

I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java's .equals(..), with nil/null equivalent to "not present".

i.e. I am looking for the most efficient way to a write a function like:

(map-difference
  {:a 1, :b nil, :c 2, :d 3}
  {:a 1, :b "Hidden", :c 3, :e 5})

=> {:b nil, :c 2, :d 3, :e nil}

I'd prefer an immutable Clojure map as output, but a Java map would also be fine if the performance improvement would be significant.

For what it's worth, my basic test case / expectation of behaviour is that the following will be equal (up to the equivalence of null = "Not present") for any two maps a and b:

a 
(merge b (difference a b))

What would be the best way to implement this?

527

asked Aug 02 '10 11:08

mikera

1 Answers

I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:

The basic expectation of behaviour from the question text is impossible: if a and b are maps such that b contains at least one key not present in a, (merge b <sth>) cannot be equal to a.

If you end up going with an interop solution but then need to go back to a PersistentHashMap at some point, there's always

(clojure.lang.PersistentHashMap/create
 (doto (java.util.HashMap.)
   (.put :foo 1)
   (.put :bar 2)))
; => {:foo 1 :bar 2}

If you need to pass the keyset of a Clojure map to a Java method, you can use
```
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
```
If all keys involved are guaranteed to be Comparable, this could be exploited for efficient computation of difference on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.

It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)

(defn map-difference [m1 m2]
        (loop [m (transient {})
               ks (concat (keys m1) (keys m2))]
          (if-let [k (first ks)]
            (let [e1 (find m1 k)
                  e2 (find m2 k)]
              (cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc! m k (e1 1)) (next ks))
                    (not e1) (recur (assoc! m k (e2 1)) (next ks))
                    (not e2) (recur (assoc! m k (e1 1)) (next ks))
                    :else    (recur m (next ks))))
            (persistent! m))))

I think that just doing (concat (keys m1) (keys m2)) and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.

To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)

(defn map-difference [m1 m2]
  (let [ks1 (set (keys m1))
        ks2 (set (keys m2))
        ks1-ks2 (set/difference ks1 ks2)
        ks2-ks1 (set/difference ks2 ks1)
        ks1*ks2 (set/intersection ks1 ks2)]
    (merge (select-keys m1 ks1-ks2)
           (select-keys m2 ks2-ks1)
           (select-keys m1
                        (remove (fn [k] (= (m1 k) (m2 k)))
                                ks1*ks2)))))

answered Sep 30 '22 15:09

Michał Marczyk

Related questions
                            
                                incompatible types and fresh type-variable
                            
                                JDK8 CompletableFuture.supplyAsync how to deal with interruptedException
                            
                                JDK 1.8 on Linux missing JNI include file
                            
                                Collection<? extends T> vs Collection<T>
                            
                                Compare two objects with "<" or ">" operators in Java
                            
                                The type HTable(config,tablename) is deprecated. What use instead?
                            
                                Accessing Windows Certificate Store certs via Java?
                            
                                what's the difference between compareAndSet and weakCompareAndSet in AtomicReference?
                            
                                How to use stream in Java 8 to collect a couple of fields into one list?
                            
                                What is the time complexity of collections.Counter() in Python?
                            
                                How do I run JUnit 5 integration tests with the Maven Failsafe plugin?
                            
                                Visual Studio Code Java: Disable serialVersionUID warning
                            
                                Running javafx sample on JDK 11 with OpenJFX 11 JMODS on Module Path
                            
                                LocalDate.plus Incorrect Answer
                            
                                Exception java.lang.UnsatisfiedLinkError when trying to open allure-reports in webdriver.io project
                            
                                Considering object encapsulation, should getters return an immutable property?
                            
                                Downloading complete maven remote repository to Local repository?
                            
                                Java: How to get Unicode name of a character (or its type category)?
                            
                                What are the best workarounds for known problems with Hibernate's schema validation of floating point columns when using Oracle 10g?
                            
                                How do I remove the maximize and minimize buttons from a JFrame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between two maps

Tags:

java

algorithm

hashmap

clojure

mikera

People also ask

1 Answers

Michał Marczyk

Recent Activity

Donate For Us