Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure Performance, How to Type hint to r/map

Below, I have 2 functions computing the sum of squares of their arguments. The first one is nice and functional, but 20x slower than the second one. I presume that the r/map is not taking advantage of aget to retrieve elements from the double-array, whereas I'm explicitly doing this in function 2.

Is there any way I can further typehint or help r/map r/fold to perform faster?

(defn sum-of-squares
  "Given a vector v, compute the sum of the squares of elements."
  ^double [^doubles v]
  (r/fold + (r/map #(* % %) v)))

(defn sum-of-squares2
  "This is much faster than above.  Post to stack-overflow to see."
  ^double [^doubles v]
  (loop [val 0.0
         i (dec (alength v))]
    (if (neg? i)
      val
      (let [x (aget v i)]
        (recur (+ val (* x x)) (dec i))))))

(def a (double-array (range 10)))
(quick-bench (sum-of-squares a))

800 ns

(quick-bench (sum-of-squares2 a))

40 ns

like image 937
Scott Klarenbach Avatar asked Jan 27 '15 20:01

Scott Klarenbach


2 Answers

Before experiments I've added next line in project.clj:

:jvm-opts ^:replace [] ; Makes measurements more accurate

Basic measurements:

(def a (double-array (range 1000000))) ; 10 is too small for performance measurements
(quick-bench (sum-of-squares a)) ; ... Execution time mean : 27.617748 ms ...
(quick-bench (sum-of-squares2 a)) ; ... Execution time mean : 1.259175 ms ...

This is more or less consistent with time difference in the question. Let's try to not use Java arrays (which are not really idiomatic for Clojure):

(def b (mapv (partial * 1.0) (range 1000000))) ; Persistent vector
(quick-bench (sum-of-squares b)) ; ... Execution time mean : 14.808644 ms ...

Almost 2 times faster. Now let's remove type hints:

(defn sum-of-squares3
"Given a vector v, compute the sum of the squares of elements."
[v]
(r/fold + (r/map #(* % %) v)))

(quick-bench (sum-of-squares3 a)) ; Execution time mean : 30.392206 ms
(quick-bench (sum-of-squares3 b)) ; Execution time mean : 15.583379 ms

Execution time increased only marginally comparing to version with type hints. By the way, version with transducers has very similar performance and is much cleaner:

(defn sum-of-squares3 [v]
  (transduce (map #(* % %)) + v))

Now about additional type hinting. We can indeed optimize first sum-of-squares implementation:

(defn square ^double [^double x] (* x x))

(defn sum-of-squares4
  "Given a vector v, compute the sum of the squares of elements."
  [v]
  (r/fold + (r/map square v)))

(quick-bench (sum-of-squares4 b)) ; ... Execution time mean : 12.891831 ms ...

(defn pl
  (^double [] 0.0)
  (^double [^double x] (+ x))
  (^double [^double x ^double y] (+ x y)))

(defn sum-of-squares5
  "Given a vector v, compute the sum of the squares of elements."
  [v]
  (r/fold pl (r/map square v)))

(quick-bench (sum-of-squares5 b)) ; ... Execution time mean : 9.441748 ms ...

Note #1: type hints on arguments and return value of sum-of-squares4 and sum-of-squares5 have no additional performance benefits.

Note #2: It's generally bad practice to start with optimizations. Straight-forward version (apply + (map square v)) will have good enough performance for most situations. sum-of-squares2 is very far from idiomatic and uses literally no Clojure concepts. If this is really performance critical code - better to implement it in Java and use interop. Code will be much cleaner despite of having 2 languages. Or even implement it in unmanaged code (C, C++) and use JNI (not really maintainable but if properly implemented, can give the best possible performance).

like image 74
Jarlax Avatar answered Nov 05 '22 16:11

Jarlax


Why not use areduce:

(def sum-of-squares3 ^double [^doubles v]
  (areduce v idx ret 0.0
           (let [item (aget v idx)]
             (+ ret (* item item)))))

On my machine running:

(criterium/bench (sum-of-squares3 (double-array (range 100000))))

Gives a mean execution time of 1.809103 ms, your sum-of-squares2 executes the same calculation in 1.455775 ms. I think this version using areduce is more idiomatic than your version.

For squeezing a little bit more performance you can try using unchecked math (add-unchecked and multiply-unchecked). But beware, you need to be sure that your calculation cannot overflow:

(defn sum-of-squares4 ^double [^doubles v]
  (areduce v idx ret 0.0
           (let [item (aget v idx)]
             (unchecked-add ret (unchecked-multiply item item)))))

Running the same benchmark gives a mean execution time of 1.144197 ms. Your sum-of-squares2 can also benefit from unchecked math with a 1.126001 ms mean execution time.

like image 25
Rodrigo Taboada Avatar answered Nov 05 '22 16:11

Rodrigo Taboada