Why is there no significant speedup using reducers in this example?

Question

(require '[clojure.core.reducers :as r])

(def data (into [] (take 10000000 (repeatedly #(rand-int 1000)))))

(defn frequencies [coll]
  (reduce (fn [counts x]
    (merge-with + counts {x 1}))
    {} coll))

(defn pfrequencies [coll]
  (r/reduce (fn [counts x]
    (merge-with + counts {x 1}))
    {} coll))


user=> (time (do (frequencies data) nil))
"Elapsed time: 29697.183 msecs"

user=> (time (do (pfrequencies data) nil))
"Elapsed time: 25273.794 msecs"

user=> (time (do (frequencies data) nil))
"Elapsed time: 25384.086 msecs"

user=> (time (do (pfrequencies data) nil))
"Elapsed time: 25778.502 msecs"

And who can show me an example with significant speedup?

I'm running on Mac OSX 10.7.5 with Java 1.7 on an Intel Core i7 (2 cores, http://ark.intel.com/products/54617).

NielsK · Accepted Answer

Some serious food for thought in the answers here. In this specific case maps should not be needed, since the result domain can be easily predicted and put in a vector where the index can be used. So, a naive implementation of a naive problem would be something like:

(defn freqs
  [coll]
  (reduce (fn [counts x] (assoc counts x (inc (get counts x))))
          (vec (int-array 1000 0))
          coll))

(defn rfreqs
     [coll]
     (r/fold
       (fn combinef
         ([] (vec (int-array 1000 0)))
         ([& cols] (apply mapv + cols)))
       (fn reducef
         [counts x] (assoc counts x (inc (get counts x))))
       coll))

Here the combinef would be a simple map addition over the 1000 columns of the resulting collections, which should be negligible.

This gives the reducer version a speedup of about 2-3x over the normal version, especially on bigger (10x-100x) datasets. Some twiddling with the partition size of r/fold (optional 'n' parameter) can be done as finetuning. Seemed optimal to use (* 16 1024) with a data size of 1E8 (need 6GB JVM at least).

You could even use transients in both versions, but I didn't notice much improvements.

I know this version isn't appropriate for generic usage, but it might show the speed improvement without the hash management overhead.

A. Webb · Answer

A fold version of your frequencies function would look something like

(defn pfrequencies [coll] 
  (r/fold 
    (fn combinef
      ([] {})
      ([x y] (merge-with + x y)))
    (fn reducef
      ([counts x] (merge-with + counts {x 1})))
    coll))

On 2 cores, it will likely be much slower than clojure.core/frequencies which uses transients. At least on 4 cores, it is faster (2x) than the first implementation, but still slower than clojure.core/frequencies.

You might also experiment with

(defn p2frequencies [coll]
  (apply merge-with + (pmap clojure.core/frequencies (partition-all 512 coll))))

Why is there no significant speedup using reducers in this example?

Tags:

parallel-processing

clojure

reducers

Michiel Borkent

2 Answers

NielsK

A. Webb

Recent Activity

Donate For Us

Why is there no significant speedup using reducers in this example?

Tags:

parallel-processing

clojure

reducers

Michiel Borkent

2 Answers

NielsK

A. Webb

Related questions

Recent Activity

Donate For Us