In many resources on Reducers (like the canonical blog post by Rich Hickey), it's claimed that reducers are faster than the regular collection functions ((map ... (filter ...))
etc.) because there is less overhead.
What is the extra overhead that's avoided? IIUC even the lazy collection functions end up walking the original sequence just once. Is the difference in the details of how intermediate results are computed?
Pointers to relevant places in the implementation of Clojure that help understand the difference will be most helpful too
I think one key insight is in the following passage from the original blog post:
(require '[clojure.core.reducers :as r]) (reduce + (r/filter even? (r/map inc [1 1 1 2]))) ;=> 6
That should look familiar - it's the same named functions, applied in the same order, with the same arguments, producing the same result as the Clojure's seq-based fns. The difference is that, reduce being eager, and these reducers fns being out of the seq game, there's no per-step allocation overhead, so it's faster. Laziness is great when you need it, but when you don't you shouldn't have to pay for it.
The realisation of a lazy sequence comes with a (linear) allocation cost: every time another element from the lazy seq is realised, the rest of the seq is stored in a new thunk, and the representation of such a ‘thunk’ is a new clojure.lang.LazySeq
object.
I believe those LazySeq
objects are the allocation overhead referred to in the quote. With reducers there is no gradual realisation of lazy seq elements and therefore no instantiation of LazySeq
thunks at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With