I wrote a binary search function as part of a larger program, but it seems to be slower than it should be and profiling shows a lot of calls to methods in clojure.lang.Numbers.
My understanding is that Clojure can use primitives when it can determine that it can do so. The calls to the methods in clojure.lang.Numbers seems to indicate that it's not using primitives here.
If I coerce the loop variables to ints, it properly complains that the recur arguments are not primitive. If i coerce those too, the code works again but again it's slow. My only guess is that (quot (+ low-idx high-idx) 2)
is not producing a primitive but I'm not sure where to go from here.
This is my first program in Clojure so feel free to let me know if there are more cleaner/functional/Clojure ways to do something.
(defn binary-search
[coll coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
(defn binary-search-perf-test
[test-size]
(do
(let [test-set (vec (range 1 (inc test-size))) test-set-size (count test-set)]
(time (count (map #(binary-search2 test-set test-set-size %) test-set)))
)))
in Clojure 1.2.x you can only coerce local variables and they can't cross functions calls.
starting in Clojure 1.3.0 Clojure can use primative numbers across function calls but not through Higher Order Functions such as map
.
if you are using clojure 1.3.0+ then you should be able to accomplish this using type hints
as with any clojure optimization problem the first step is to turn on (set! *warn-on-reflection* true)
then add type hints until it no longer complains.
user=> (set! *warn-on-reflection* true)
true
user=> (defn binary-search
[coll coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
NO_SOURCE_FILE:23 recur arg for primitive local: low_idx is not matching primitive,
had: Object, needed: long
Auto-boxing loop arg: low-idx
#'user/binary-search
user=>
to remove this you can type hint the coll-size argument
(defn binary-search
[coll ^long coll-size target]
(let [cnt (dec coll-size)]
(loop [low-idx 0 high-idx cnt]
(if (> low-idx high-idx)
nil
(let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)]
(cond
(= mid-val target) mid-idx
(< mid-val target) (recur (inc mid-idx) high-idx)
(> mid-val target) (recur low-idx (dec mid-idx))
))))))
it is understandably difficult to connect the auto-boxing on line 10 to the coll-size parameter because it goes through cnt
then high-idx
then mid-ixd
and so on, so I generally approach these problems by type-hinting everything until I find the one that makes the warnings go away, then remove hints so long as they stay gone
First of all, you can use the binary search implementation provided by java.util.Collections
:
(java.util.Collections/binarySearch [0 1 2 3] 2 compare)
; => 2
If you skip the compare
, the search will be faster still, unless the collection includes bigints, in which case it'll break.
As for your pure Clojure implementation, you can hint coll-size
with ^long
in the parameter vector -- or maybe just ask for the vector's size at the beginning of the function's body (that's a very fast, constant time operation), replace the (quot ... 2)
call with (bit-shift-right ... 1)
and use unchecked math for the index calculations. With some additional tweaks a binary search could be written as follows:
(defn binary-search
"Finds earliest occurrence of x in xs (a vector) using binary search."
([xs x]
(loop [l 0 h (unchecked-dec (count xs))]
(if (<= h (inc l))
(cond
(== x (xs l)) l
(== x (xs h)) h
:else nil)
(let [m (unchecked-add l (bit-shift-right (unchecked-subtract h l) 1))]
(if (< (xs m) x)
(recur (unchecked-inc m) h)
(recur l m)))))))
This is still noticeably slower than the Java variant:
(defn java-binsearch [xs x]
(java.util.Collections/binarySearch xs x compare))
binary-search
as defined above seems to take about 25% more time than this java-binsearch
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With