user=> (def r (range 1))
user=> (for [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
(list a b c d e f g h))
((0 0 0 0 0 0 0 0))
user=> (doseq [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
(println (list a b c d e f g h)))
CompilerException java.lang.RuntimeException: Method code too large!, compiling:(/tmp/form-init8346140986526777871.clj:1:1)
This appears to come from clojure.asm.MethodWriter. My Googling for this error with Clojure turns up almost no hits.
So...what on earth is going on? How deep does this rabbit hole go? Is this one line of Clojure code really producing a >65KB method (the value comes from MethodWriter's source)?
If this answer is hitting on the issue I'm running into, then (a) why does chunking mean it grows exponentially instead of linearly? And (b) what are the ramifications for me as a programmer? For example, is this behavior well-known and intended? Should I avoid using doseq
for any situation with more than 3 or 4 bindings? How does this compare to using for
and doall
?
Perhaps related:
Clojure doseq generates huge code
Method code too large! exception using ASM
What you're seeing is a nasty side effect of an optimization that was put into the implementation of the doseq
macro to handle chunked sequences in the input. The answer of the question that you linked correctly describes the underlying cause, but doesn't shed a whole lot of light on why things happen the way they do.
The doseq
implementation internally uses a function that recursively builds up a series of nested loop
constructs, one loop
for each level of bindings in the doseq
. In a naive, unoptimized version of this implementation, the loop at each level would simply run its body and then call recur
with the next
value for its seq. Something along these lines:
(loop [s (seq input)]
(if s
(do (run-body (first s))
(recur (next s)))))
If that seq happens to be a chunked sequence, though, this will cause the unnecessary creation of lots of intermediate seq objects that are never used outside the body of the loop. The optmization that doseq
has made is to put an if
inside the loop
with one branch to handle chunked sequences and one to handle non-chunked sequences. The loop body is duplicated between each branch. If the loop body happens to be a nested loop, then you can see how the exponential increase in code size happens - the loop at each level of the expanded code has two child loops.
So, to answer your question, I wouldn't exactly say the explosion in code size is intended, but it's a consequence of the designed behavior of doseq
. It just wasn't designed to handled deeply nested loops, and in the wild I've never seen it used with more than one or two levels of bindings.
You can reproduce the semantics of a deeply nested doseq
with a combination of for
and dorun
(don't used doall
as this unnecessarily retains the head of the seq). This will allow you to handle any level of nesting, with a slight but measurable performance hit if you happen to be running through a chunked sequence in a tight loop.
user> (time (doseq [x (range 10000) y (range 10000)] (* x y)))
"Elapsed time: 2933.543178 msecs"
user> (time (dorun (for [x (range 10000) y (range 10000)] (* x y))))
"Elapsed time: 5560.90003 msecs"
I had a similar problem when I was creating my own compiler using java.
I declared a very large matrix. For me the solution was separated into small matrix. It's just a suggestion, maybe you can do something similar, such as:
(def r (range 1))
(defn foo [a b c d]
(doseq [e r, f r, g r, h r] (println "Hi")))
(doseq [a r, b r, c r, d r :when (and (= 0 a) (not= 1 b))]
(foo a b c d))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With