Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic/Efficient Clojure way to intersect two a priori sorted vectors?

Tags:

vector

clojure

I have a pair of vectors x and y of unique items, each of which I know to be sorted. I wish to have the intersection of the two, maintaining sort order. The result ideally would be another vector, for fast random access.

The generation below is merely for the sake of example, my x and y will come presorted and pre-distinct (they are in fact time samples).

(defn gen-example [c] (-> (repeatedly c #(-> c rand int)) distinct sort vec))

user=> (def x (gen-example 100000)) (count x)
#'user/x
63161
user=> (def y (gen-example 100000)) (count y)
#'user/y
63224

I know Clojure has clojure.set/intersection which can work on a sorted-set. My x and y have the same properties (sorted distinct elements) but are not the same type.

Question 1: Is there a better/faster way to convert x and y to sorted-sets than (apply sorted-set x) given that they are already distinct and sorted?

user=> (time (def ssx (apply sorted-set x)))
"Elapsed time: 607.642592 msecs"
user=> (time (def ssy (apply sorted-set y)))
"Elapsed time: 617.046022 msecs"

Now I am ready to perform my intersection

user=> (time (count (clojure.set/intersection ssx ssy)))
"Elapsed time: 355.42534 msecs"
39992

This is somewhat disappointing performance, and a cursory look at (source clojure.set/intersection) does not seem to show any special treatment for the fact that these sets are sorted.

Question 2: Is there a better/faster way to perform the intersection of sorted-sets than clojure.set/intersection?

(defn intersect-sorted-vector [x y] 
  (loop [x (seq x) y (seq y) acc []] 
    (if (and x y)
      (let [x1 (first x) 
            y1 (first y)] 
      (cond 
        ( < x1 y1) (recur (next x) y acc) 
        ( > x1 y1) (recur x (next y) acc) 
        :else (recur (next x) (next y) (conj acc x1))))
    acc)))

This turns out to be a good deal (nearly 10x) faster.

user=> (time (count (intersect-sorted-vector x y)))
"Elapsed time: 40.142532 msecs"
39992

But, I can't help but feel that my code is unduly procedural/iterative.

Question 3: Could anyone kindly suggest a more idiomatic way to process a pair of vectors in Clojure?

like image 699
A. Webb Avatar asked Jan 04 '13 16:01

A. Webb


1 Answers

It is often the case that fast Clojure code looks a bit imperative. Functional code is often elegant, but comes with some associated performance costs that you have to pay for (laziness, extra GC pressure from discarded immutable objects etc.)

Also, converting into sets is always going to be more expensive. Building a set is an O(n log n) operation in itself, but you can exploit the fact that the vectors are already supported to implement an intersection operation in O(n) time.

Your code is already pretty good, but there are still a couple more optimisations you can do:

  • Use a transient vector to collect the results. These are a bit faster than regular persistent vectors for lots of sequential conj operations.
  • Used indexed access with primitives into the vectors rather than traversing a sequence with first/next. This avoids creating temporary seq objects (and related GC)

Resulting code might look something like:

(defn intersect-sorted-vector [x y]
  (loop [i (long 0), j (long 0), r (transient [])]
    (let [xi (nth x i nil), yj (nth y j nil)]
      (cond 
        (not (and xi yj)) (persistent! r)
        (< xi yj) (recur (inc i) j r)
        (> xi yj) (recur i (inc j) r)
        :else (recur (inc i) (inc j) (conj! r xi))))))

(time (count (intersect-sorted-vector x y)))
=> "Elapsed time: 5.143687 msecs"
=> 40258

So as you can see, this probably gives you an extra 6-8x speedup or thereabouts.

like image 76
mikera Avatar answered Oct 18 '22 18:10

mikera