Here's a snippet of code that gives me a StackOverflowError
(boiled down from an actual example in my codebase):
( ->> (range 3000)
(mapcat #(concat [0] (take 100 (repeat %))))
(reduce (constantly nil))
(count))
(Note: this code isn't designed to do anything other than demonstrate the problem or return zero.)
I can "rescue" it by any of the following steps:
reduce
line[0]
to '(0)
(take 100000000)
(or any integer) at any point between mapcat
and count
.I'm basically baffled by this behavior (especially #2). I'd appreciate any input.
(I have a feeling this may be related to Why does reduce give a StackOverflowError in Clojure?, but I can't quite tell how -- so if it is related, I'd appreciate an explanation of why.)
Under normal circumstances, reduce
operates using a loop
/recur
construct and uses constant stack space. However, you've hit a nasty corner case caused by reducing over a sequence produced by providing concat
with alternating chunked and non-chunked sequences (the vector [0]
is chunked; the seq produced by (take 100 (repeat %))
is non-chunked).
When the first argument to concat
is a chunked sequence, then it will return a lazy sequence that will use chunk-cons
to produce another chunked sequence. Otherwise, it will use cons
to produce a non-chunked sequence.
Meanwhile, the implementation of reduce
uses the InternalReduce
protocol (defined in clojure.core.protocols
) which provides an internal-reduce
function for structures that can reduce themselves more efficiently than with the default first/next recursion. The internal-reduce
implementation for chunked sequences uses chunk functions to consume the chunked items in a loop up until it's left with a non-chunked sequence, then calls internal-reduce
on the remainder. The default internal-reduce
implementation similarly uses first
/next
to consume items in a loop until the underlying seq type changes, then calls internal-reduce
on the new seq type to dispatch to the appropriate optimized version. As you progress through the seq that concat
produced, alternating between chunked and non-chunked sub-sequences, the internal-reduce
calls pile up on the stack and eventually blow it.
A simpler illustration of this case is:
;; All chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat [1]))))
10000
;; All non-chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat '(1)))))
10000
;; Interleaved chunked and non-chunked sub-seqs blows the stack
user> (reduce + (apply concat (take 10000 (interleave (repeat [1]) (repeat '(1))))))
StackOverflowError clojure.lang.LazySeq.seq (LazySeq.java:60)
Examining the stack trace:
StackOverflowError
clojure.core/seq (core.clj:133)
clojure.core/interleave/fn--4525 (core.clj:3901)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.RT.seq (RT.java:484)
clojure.core/seq (core.clj:133)
clojure.core/take/fn--4232 (core.clj:2554)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.Cons.next (Cons.java:39)
clojure.lang.RT.next (RT.java:598)
clojure.core/next (core.clj:64)
clojure.core/concat/cat--3925/fn--3926 (core.clj:694)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
clojure.core/chunk-next (core.clj:660)
clojure.core.protocols/fn--6041 (protocols.clj:101)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6034 (protocols.clj:147)
clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
clojure.core.protocols/fn--6041 (protocols.clj:104)
As for your workarounds:
reduce
prevents the problem altogether.[0]
to '(0)
replaces the chunked sub-sequences with non-chunked ones, circumventing the optimization for chunked seqs in internal-reduce
and allowing the reduction to happen in a single loop with constant stack space.take
creates a new, non-chunked sequence, composed entirely of cons cells.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With