Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary search in clojure (implementation / performance)

Tags:

clojure

I wrote a binary search function as part of a larger program, but it seems to be slower than it should be and profiling shows a lot of calls to methods in clojure.lang.Numbers.

My understanding is that Clojure can use primitives when it can determine that it can do so. The calls to the methods in clojure.lang.Numbers seems to indicate that it's not using primitives here.

If I coerce the loop variables to ints, it properly complains that the recur arguments are not primitive. If i coerce those too, the code works again but again it's slow. My only guess is that (quot (+ low-idx high-idx) 2) is not producing a primitive but I'm not sure where to go from here.

This is my first program in Clojure so feel free to let me know if there are more cleaner/functional/Clojure ways to do something.

(defn binary-search
  [coll coll-size target]
  (let [cnt (dec coll-size)]
    (loop [low-idx 0 high-idx cnt]
      (if (> low-idx high-idx)
        nil
        (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
          (cond
            (= mid-val target) mid-idx
            (< mid-val target) (recur (inc mid-idx) high-idx)
            (> mid-val target) (recur low-idx (dec mid-idx))
            ))))))

(defn binary-search-perf-test
  [test-size]
  (do
    (let [test-set (vec (range 1 (inc test-size))) test-set-size (count test-set)]
      (time (count (map #(binary-search2 test-set test-set-size %) test-set)))
    )))
like image 674
Vadim Avatar asked Jan 21 '12 01:01

Vadim


2 Answers

in Clojure 1.2.x you can only coerce local variables and they can't cross functions calls. starting in Clojure 1.3.0 Clojure can use primative numbers across function calls but not through Higher Order Functions such as map.

if you are using clojure 1.3.0+ then you should be able to accomplish this using type hints

as with any clojure optimization problem the first step is to turn on (set! *warn-on-reflection* true) then add type hints until it no longer complains.

user=> (set! *warn-on-reflection* true)                                          
true
user=> (defn binary-search
  [coll coll-size target]
  (let [cnt (dec coll-size)]
    (loop [low-idx 0 high-idx cnt]
      (if (> low-idx high-idx)
        nil
        (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
          (cond
            (= mid-val target) mid-idx
            (< mid-val target) (recur (inc mid-idx) high-idx)
            (> mid-val target) (recur low-idx (dec mid-idx))
            ))))))
NO_SOURCE_FILE:23 recur arg for primitive local: low_idx is not matching primitive, 
had: Object, needed: long
Auto-boxing loop arg: low-idx
#'user/binary-search
user=> 

to remove this you can type hint the coll-size argument

(defn binary-search
  [coll ^long coll-size  target]
  (let [cnt (dec coll-size)]
    (loop [low-idx 0 high-idx cnt]
      (if (> low-idx high-idx)
        nil
        (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
          (cond
            (= mid-val target) mid-idx
            (< mid-val target) (recur (inc mid-idx) high-idx)
            (> mid-val target) (recur low-idx (dec mid-idx))
            ))))))

it is understandably difficult to connect the auto-boxing on line 10 to the coll-size parameter because it goes through cnt then high-idx then mid-ixd and so on, so I generally approach these problems by type-hinting everything until I find the one that makes the warnings go away, then remove hints so long as they stay gone

like image 139
Arthur Ulfeldt Avatar answered Oct 29 '22 13:10

Arthur Ulfeldt


First of all, you can use the binary search implementation provided by java.util.Collections:

(java.util.Collections/binarySearch [0 1 2 3] 2 compare)
; => 2

If you skip the compare, the search will be faster still, unless the collection includes bigints, in which case it'll break.

As for your pure Clojure implementation, you can hint coll-size with ^long in the parameter vector -- or maybe just ask for the vector's size at the beginning of the function's body (that's a very fast, constant time operation), replace the (quot ... 2) call with (bit-shift-right ... 1) and use unchecked math for the index calculations. With some additional tweaks a binary search could be written as follows:

(defn binary-search
  "Finds earliest occurrence of x in xs (a vector) using binary search."
  ([xs x]
     (loop [l 0 h (unchecked-dec (count xs))]
       (if (<= h (inc l))
         (cond
           (== x (xs l)) l
           (== x (xs h)) h
           :else nil)
         (let [m (unchecked-add l (bit-shift-right (unchecked-subtract h l) 1))]
           (if (< (xs m) x)
             (recur (unchecked-inc m) h)
             (recur l m)))))))

This is still noticeably slower than the Java variant:

(defn java-binsearch [xs x]
  (java.util.Collections/binarySearch xs x compare))

binary-search as defined above seems to take about 25% more time than this java-binsearch.

like image 32
Michał Marczyk Avatar answered Oct 29 '22 14:10

Michał Marczyk