I'm trying to concatenate a Seq of Seqs.
I can do it with apply concat
.
user=> (count (apply concat (repeat 3000 (repeat 3000 true)))) 9000000
However, from my limited knowledge, I would assume that the use of apply
forces the lazy Seq to be realised, and that doesn't seem right for very large inputs. I'd rather do this lazily if I can.
So I thought that using reduce
would do the job.
user=> (count (reduce concat (repeat 3000 (repeat 3000 true))))
But this results in
StackOverflowError clojure.lang.RT.seq (RT.java:484)
I'm surprised because I would have thought that the semantics of reduce
would mean it was tail-call recursive.
Two questions:
apply
the best way to do this?reduce
generally inappropriate for large inputs?StackOverflowError is a runtime error which points to serious problems that cannot be caught by an application. The java. lang. StackOverflowError indicates that the application stack is exhausted and is usually caused by deep or infinite recursion.
The common cause for a stack overflow is a bad recursive call. Typically, this is caused when your recursive functions doesn't have the correct termination condition, so it ends up calling itself forever.
StackOverflowError is an error which Java doesn't allow to catch, for instance, stack running out of space, as it's one of the most common runtime errors one can encounter.
apply
. When the function argument is lazy, so is apply
.Let's check that with a counting side effect on the underlying sub-sequences:
(def counter (atom 0)) (def ss (repeatedly 3000 (fn [] (repeatedly 3000 (fn [] (do (swap! counter inc) true)))))) (def foo (apply concat ss)) so.core=> @counter 0 so.core=> (dorun (take 1 foo)) nil so.core=> @counter 1 so.core=> (dorun (take 3001 foo)) nil so.core=> @counter 3001
reduce
with a large number of concat
s overflows due to thunk compositionLazy sequences, such as those produced by concat
are implemented with thunks, delayed function calls. When you concat
the result of a concat
you have nested a thunk within another thunk. In your function, the nesting goes 3000 deep and thus the stack is overflowed as soon as the first item is requested and the 3000 nested thunks are unwound.
so.core=> (def bar (reduce concat (repeat 3000 (repeat 3000 true)))) #'so.core/bar so.core=> (first bar) StackOverflowError clojure.lang.LazySeq.seq (LazySeq.java:49)
The implementation of lazy-sequences will in general unwind nested thunks trampoline style when seq
ed and not blow the stack:
so.core=> (loop [lz [1], n 0] (if (< n 3000) (recur (lazy-seq lz) (inc n)) lz)) (1)
However, if you call seq
within the lazy-sequence on the unrealized portion while realizing it...
so.core=> (loop [lz [1], n 0] (if (< n 3000) (recur (lazy-seq (seq lz)) (inc n)) lz)) StackOverflowError so.core/eval1405/fn--1406 (form-init584039696026177116.clj:1) so.core=> (pst 3000)
StackOverflowError so.core/eval1619/fn--1620 (form-init584039696026177116.clj:2) clojure.lang.LazySeq.sval (LazySeq.java:40) clojure.lang.LazySeq.seq (LazySeq.java:49) clojure.lang.RT.seq (RT.java:484) clojure.core/seq (core.clj:133) so.core/eval1619/fn--1620 (form-init584039696026177116.clj:2) clojure.lang.LazySeq.sval (LazySeq.java:40) clojure.lang.LazySeq.seq (LazySeq.java:49) clojure.lang.RT.seq (RT.java:484) clojure.core/seq (core.clj:133) ... (repeatedly)
Then you end up building seq
stack frames. The implementation of concat
is such. Examine the stack trace for your StackOverflowError with concat
and you will see similar.
I can suggest a way to avoid the problem. The reduce
function isn't the problem here; concat
is.
Take a look at: https://stuartsierra.com/2015/04/26/clojure-donts-concat
Instead of using concat
use into
(count (reduce into (repeat 3000 (repeat 3000 true)))) 9000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With