Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For vs. Doseq (and Method Code Too Large)

user=> (def r (range 1))
user=> (for [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
          (list a b c d e f g h))
((0 0 0 0 0 0 0 0))
user=> (doseq [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
          (println (list a b c d e f g h)))
CompilerException java.lang.RuntimeException: Method code too large!, compiling:(/tmp/form-init8346140986526777871.clj:1:1)

This appears to come from clojure.asm.MethodWriter. My Googling for this error with Clojure turns up almost no hits.

So...what on earth is going on? How deep does this rabbit hole go? Is this one line of Clojure code really producing a >65KB method (the value comes from MethodWriter's source)?

If this answer is hitting on the issue I'm running into, then (a) why does chunking mean it grows exponentially instead of linearly? And (b) what are the ramifications for me as a programmer? For example, is this behavior well-known and intended? Should I avoid using doseq for any situation with more than 3 or 4 bindings? How does this compare to using for and doall?

Perhaps related:

Clojure doseq generates huge code

Method code too large! exception using ASM

like image 695
galdre Avatar asked Oct 14 '14 06:10

galdre


2 Answers

What you're seeing is a nasty side effect of an optimization that was put into the implementation of the doseq macro to handle chunked sequences in the input. The answer of the question that you linked correctly describes the underlying cause, but doesn't shed a whole lot of light on why things happen the way they do.

The doseq implementation internally uses a function that recursively builds up a series of nested loop constructs, one loop for each level of bindings in the doseq. In a naive, unoptimized version of this implementation, the loop at each level would simply run its body and then call recur with the next value for its seq. Something along these lines:

(loop [s (seq input)]
  (if s
    (do (run-body (first s))
        (recur (next s)))))

If that seq happens to be a chunked sequence, though, this will cause the unnecessary creation of lots of intermediate seq objects that are never used outside the body of the loop. The optmization that doseq has made is to put an if inside the loop with one branch to handle chunked sequences and one to handle non-chunked sequences. The loop body is duplicated between each branch. If the loop body happens to be a nested loop, then you can see how the exponential increase in code size happens - the loop at each level of the expanded code has two child loops.

So, to answer your question, I wouldn't exactly say the explosion in code size is intended, but it's a consequence of the designed behavior of doseq. It just wasn't designed to handled deeply nested loops, and in the wild I've never seen it used with more than one or two levels of bindings.

You can reproduce the semantics of a deeply nested doseq with a combination of for and dorun (don't used doall as this unnecessarily retains the head of the seq). This will allow you to handle any level of nesting, with a slight but measurable performance hit if you happen to be running through a chunked sequence in a tight loop.

user> (time (doseq [x (range 10000) y (range 10000)] (* x y)))
"Elapsed time: 2933.543178 msecs"

user> (time (dorun (for [x (range 10000) y (range 10000)] (* x y))))
"Elapsed time: 5560.90003 msecs"
like image 194
Alex Avatar answered Oct 15 '22 11:10

Alex


I had a similar problem when I was creating my own compiler using java.

I declared a very large matrix. For me the solution was separated into small matrix. It's just a suggestion, maybe you can do something similar, such as:

(def r (range 1))
(defn foo [a b c d]
   (doseq [e r, f r, g r, h r] (println "Hi")))
(doseq [a r, b r, c r, d r :when (and (= 0 a) (not= 1 b))]
   (foo a b c d))
like image 28
Gustavo Rossi Muller Avatar answered Oct 15 '22 11:10

Gustavo Rossi Muller