Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tuning clojure reducers library performance

Why is mapping/reducing with reducers library has worse performance than normal map/reduce?

user=> (time (reduce + (map inc (range 100000000))))
"Elapsed time: 14412.627966 msecs"
5000000050000000

user=> (time (reduce + (r/map inc (range 100000000))))
... (C-c)

user=> (time (r/reduce + (r/map inc (range 100000000))))
....(C-c)

I had two kill the later two as it takes indefinitely long. What is wrong here?

EDIT: It seems like other languages also have similar problems. Scala seems to be breaking at a million only. Why do Scala parallel collections sometimes cause an OutOfMemoryError? . While clojure reducers are faster than normal at a million.

like image 968
FUD Avatar asked Feb 26 '14 04:02

FUD


2 Answers

To complement @a-webb's answer, this is a compiler bug and an involved one to really fix. (See this post for more details.)

Another way to work around the problem is to use a fuse:

(defn once-seq
  "Returns a sequence on which seq can be called only once."
  [coll]
  (let [a (atom coll)]
    (reify clojure.lang.Seqable
      (seq [_]
        (let [coll @a]
          (reset! a nil)
          (seq coll))))))

and then:

=> (time (r/reduce + (r/map inc (once-seq (range 100000000)))))
"Elapsed time: 3692.765 msecs"
5000000050000000
like image 67
cgrand Avatar answered Sep 28 '22 00:09

cgrand


Performance is grinding to a halt as memory is being exhausted. If you keep waiting, you would most likely encounter a memory error. Creating a reducer holds the head of the collection in a closure. Thus, the huge lazy-sequence ties up memory as it becomes realized.

Here's what's happening, distilled

user=> (def n 100000000)
#'user/n

user=> (time (dorun (range n)))
"Elapsed time: 12349.034702 msecs"

Now the same, but from within a closure

user=> (defn foo [x] (fn [f] (f x)))
#'user/foo

user=> (time ((foo (range n)) dorun))
OutOfMemoryError GC overhead limit exceeded ... (sometime later)

Compare to

(time (do (doall (range n)) nil))
OutOfMemoryError GC overhead limit exceeded ... (sometime later)

The suspect closure in reducers

user=> (source r/folder)
(defn folder
  "Given a foldable collection, [...]"
  {:added "1.5"}
  ([coll xf]
     (reify
      clojure.core.protocols/CollReduce
      (coll-reduce [_ f1]
                   (clojure.core.protocols/coll-reduce coll (xf f1) (f1)))
   ...

Christophe Grand has a nice post on how to compose reducers in a lazy fashion.

like image 45
A. Webb Avatar answered Sep 27 '22 23:09

A. Webb