Here's a snippet of code that gives me a StackOverflowError (boiled down from an actual example in my codebase):
( ->> (range 3000)
(mapcat #(concat [0] (take 100 (repeat %))))
(reduce (constantly nil))
(count))
(Note: this code isn't designed to do anything other than demonstrate the problem or return zero.)
I can "rescue" it by any of the following steps:
reduce line[0] to '(0)
(take 100000000) (or any integer) at any point between mapcat and count.I'm basically baffled by this behavior (especially #2). I'd appreciate any input.
(I have a feeling this may be related to Why does reduce give a StackOverflowError in Clojure?, but I can't quite tell how -- so if it is related, I'd appreciate an explanation of why.)
Under normal circumstances, reduce operates using a loop/recur construct and uses constant stack space. However, you've hit a nasty corner case caused by reducing over a sequence produced by providing concat with alternating chunked and non-chunked sequences (the vector [0] is chunked; the seq produced by (take 100 (repeat %)) is non-chunked).
When the first argument to concat is a chunked sequence, then it will return a lazy sequence that will use chunk-cons to produce another chunked sequence. Otherwise, it will use cons to produce a non-chunked sequence.
Meanwhile, the implementation of reduce uses the InternalReduce protocol (defined in clojure.core.protocols) which provides an internal-reduce function for structures that can reduce themselves more efficiently than with the default first/next recursion. The internal-reduce implementation for chunked sequences uses chunk functions to consume the chunked items in a loop up until it's left with a non-chunked sequence, then calls internal-reduce on the remainder. The default internal-reduce implementation similarly uses first/next to consume items in a loop until the underlying seq type changes, then calls internal-reduce on the new seq type to dispatch to the appropriate optimized version. As you progress through the seq that concat produced, alternating between chunked and non-chunked sub-sequences, the internal-reduce calls pile up on the stack and eventually blow it.
A simpler illustration of this case is:
;; All chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat [1]))))
10000
;; All non-chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat '(1)))))
10000
;; Interleaved chunked and non-chunked sub-seqs blows the stack
user> (reduce + (apply concat (take 10000 (interleave (repeat [1]) (repeat '(1))))))
StackOverflowError clojure.lang.LazySeq.seq (LazySeq.java:60)
Examining the stack trace:
StackOverflowError
clojure.core/seq (core.clj:133)
clojure.core/interleave/fn--4525 (core.clj:3901)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.RT.seq (RT.java:484)
clojure.core/seq (core.clj:133)
clojure.core/take/fn--4232 (core.clj:2554)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.Cons.next (Cons.java:39)
clojure.lang.RT.next (RT.java:598)
clojure.core/next (core.clj:64)
clojure.core/concat/cat--3925/fn--3926 (core.clj:694)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
clojure.core/chunk-next (core.clj:660)
clojure.core.protocols/fn--6041 (protocols.clj:101)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
As for your workarounds:
reduce prevents the problem altogether.[0] to '(0) replaces the chunked sub-sequences with non-chunked ones, circumventing the optimization for chunked seqs in internal-reduce and allowing the reduction to happen in a single loop with constant stack space.take creates a new, non-chunked sequence, composed entirely of cons cells.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With