I have a list of uids and want to check if a uid is a member of this list The natural way to implement it would be to create a set (<code>clojure.set</code>) of uids and search for that member on that list What I found out is that map key lookup is a lot faster - I used the following snippet to benchmark both methods: <pre class="prettyprint lang-clj prettyprint-override"><code>(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1}) (def uids-map (reduce (fn [acc v] (assoc acc v true)) {} uids)) (time (dotimes [i 1000000] (:o1 uids))) ;user=> "Elapsed time: 191.076266 msecs" (time (dotimes [i 1000000] (:o1 uids-map))) ;user=> "Elapsed time: 38.159388 msecs" </code></pre> the results were very consistent across invocations - map lookup took about 1/5 of set lookup So is set not optimal for key lookup or am I using it the wrong way? Also, what are the reasons for the differences in these benchmarks? I was under the impression that a set is implemented in clojure as an associative data structure similar to vectors - so why is key lookup significantly slower than a simple map?

The implementation of <code>contains?</code> uses <code>clojure.lang.RT.contains</code> which has plenty of <code>instanceof</code> checks (compared to <code>containsKey</code>), which is likely the cause of the performance difference.

Clojure Set vs Map Lookup Performance Differences

Tags:

clojure

I have a list of uids and want to check if a uid is a member of this list

The natural way to implement it would be to create a set (clojure.set) of uids and search for that member on that list

What I found out is that map key lookup is a lot faster - I used the following snippet to benchmark both methods:

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (reduce (fn [acc v] (assoc acc v true)) {} uids))
(time (dotimes [i 1000000] (:o1 uids)))
;user=> "Elapsed time: 191.076266 msecs"
(time (dotimes [i 1000000] (:o1 uids-map)))
;user=> "Elapsed time: 38.159388 msecs"

the results were very consistent across invocations - map lookup took about 1/5 of set lookup

So is set not optimal for key lookup or am I using it the wrong way?

Also, what are the reasons for the differences in these benchmarks?

I was under the impression that a set is implemented in clojure as an associative data structure similar to vectors - so why is key lookup significantly slower than a simple map?

822

asked Nov 18 '19 09:11

Erez Rabih

2 Answers

I've never went into clojure's source but from what I see the set implementation actually uses a map inside:

protected APersistentSet(IPersistentMap impl){
    this.impl = impl;
}

It also delegates the invoke call to the internal map.

In APersistentSet:

public Object invoke(Object arg1) {
    return get(arg1);
}

// ....

public Object get(Object key){
    return impl.valAt(key);
}

In APersistentMap:

public Object invoke(Object arg1) {
    return valAt(arg1);
}

public Object invoke(Object arg1, Object notFound) {
    return valAt(arg1, notFound);
}

So this can't explain the difference.

As mentioned in the comments by @cgrand, when we reverse the arguments its faster (and about the same, since we call set's invoke immediately). So I looked up Keyword's invoke which is what is probably used for (:k obj):

final public Object invoke(Object obj, Object notFound) {
    if(obj instanceof ILookup)
        return ((ILookup)obj).valAt(this,notFound);
    return RT.get(obj, this, notFound);
}

The important thing to notice is that ILookup is implemented in APersistentMap (through Associative) but not in APersistentSet. You can also verify in clojure:

(instance? clojure.lang.ILookup #{}) ;; false
(instance? clojure.lang.ILookup {})  ;; true

So maps go through the "happy path" and sets end up in RT.get which I believe is the runtime.

Lets have a look at the runtime.

It Initially attempts to do practically the same thing as keyword:

static public Object get(Object coll, Object key){
    if(coll instanceof ILookup)
        return ((ILookup) coll).valAt(key);
    return getFrom(coll, key);
}

But since we know sets do not implement ILookup we know they go to RT.getFrom:

static Object getFrom(Object coll, Object key){
    if(coll == null)
        return null;
    else if(coll instanceof Map) {
        Map m = (Map) coll;
        return m.get(key);
    }
    else if(coll instanceof IPersistentSet) {
        IPersistentSet set = (IPersistentSet) coll;
        return set.get(key);
    }
    else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) {
        int n = ((Number) key).intValue();
        if(n >= 0 && n < count(coll))
            return nth(coll, n);
        return null;
    }
    else if(coll instanceof ITransientSet) {
        ITransientSet set = (ITransientSet) coll;
        return set.get(key);
    }

    return null;
}

Which leads me to believe the main difference is the extra delegations and instanceof calls due to sets not implementing ILookup.

As bonus I've added a test on a set that implements ILookup and delegates valAt to the internal map immediately (using proxy) which closed the gap a bit:

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (into {} (for [k uids] [k k])))
(def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
                      (valAt [k] (get uids-map k))))

;; verify
(instance? clojure.lang.APersistentSet lookupable-set) ;; true
(instance? clojure.lang.ILookup lookupable-set) ;; true

(time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
(time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs  <-- faster
(time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest

To conclude: Where performance matters - invoking the set (#{...} k) without going through keyword (k #{...}) is as fast as map.

But I could be wrong :)

111

answered Oct 15 '22 15:10

Reut Sharabani

The implementation of contains? uses clojure.lang.RT.contains which has plenty of instanceof checks (compared to containsKey), which is likely the cause of the performance difference.

answered Oct 15 '22 16:10

Jochen Bedersdorfer

Related questions
                            
                                How to compose a Clojure Query with embedded double quotes
                            
                                Does Clojure's identical? function only return true if the things being compared are actually the same instance?
                            
                                Difference between Use, Require and Import
                            
                                How to add days to current date in clojure
                            
                                Leiningen: how to exclude dependencies when uberjaring?
                            
                                loading namespace in cursive REPL
                            
                                Clojure Performance, How to Type hint to r/map
                            
                                Add delay in clojure.core.async
                            
                                Clojure: IllegalArgumentException Key must be integer
                            
                                cljs Om interop with React components?
                            
                                Is there coding convention for better place of 'coll' and 'n' for Clojure?
                            
                                What's the equivalent of Java's computeIfAbsent or putIfAbsent in Clojure?
                            
                                How to rename keys in sequence of maps in clojure
                            
                                Java lambda callback in clojure [duplicate]
                            
                                How to reify Java Interfaces with overloaded method?
                            
                                Adding a custom print method for clojurescript
                            
                                Creating a Clojure spec generator for a nested data structure with constraints across layers
                            
                                What function to use when you want both side effect and retain value in Clojure?
                            
                                How does Clojure spec differ from property-based testing libraries, such as Haskell QuickCheck?
                            
                                How to distinguish between Clojure-JVM and Clojure-CLR

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With