I am performing element-wise operations on two vectors on the order of 50,000 elements in size, and having unsatisfactory performance issues (a few seconds). Are there any obvious performance issues to be made, such as using a different data structure?
(defn boolean-compare
"Sum up 1s if matching 0 otherwise"
[proposal-img data-img]
(sum
(map
#(Math/abs (- (first %) (second %)))
(partition 2 (interleave proposal-img data-img)))))
Try this:
(apply + (map bit-xor proposal-img data-img)))
Some notes:
map
ping a function to several collections uses an element from each as the arguments to the function - no need to interleave
and partition
for this.xor
will be faster than absolute differenceTimed example:
(def data-img (repeatedly 50000 #(rand-int 2)))
(def proposal-img (repeatedly 50000 #(rand-int 2)))
(def sum (partial apply +))
After warming up the JVM...
(time (boolean-compare proposal-img data-img))
;=> "Elapsed time: 528.731093 msecs"
;=> 24802
(time (apply + (map bit-xor proposal-img data-img)))
;=> "Elapsed time: 22.481255 msecs"
;=> 24802
You should look at adopting core.matrix if you are interested in good performance for large vector operations.
In particular, the vectorz-clj library (a core.matrix implementation) has some very fast implementations for most common vector operations with double
values.
(def v1 (array (repeatedly 50000 #(rand-int 2))))
(def v2 (array (repeatedly 50000 #(rand-int 2))))
(time (let [d (sub v2 v1)] ;; take difference of two vectors
(.abs d) ;; calculate absolute value (mutate d)
(esum d))) ;; sum elements and return result
=> "Elapsed time: 0.949985 msecs"
=> 24980.0
i.e. under 20ns
per pair of elements - that's pretty quick: you'd be hard pressed to beat that without resorting to low-level array-fiddling code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With