Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clojure - (Another one) StackOverflow with loop/recur

I know this is a recurring question (here, here, and more), and I know that the problem is related to creating lazy sequencies, but I can't see why it fails.

The problem: I had written a (not very nice) quicksort algorithm to sort strings that uses loop/recur. But applied to 10000 elements, I get a StackOverflowError:

(defn qsort [list]
  (loop [[current & todo :as all] [list] sorted []]
    (cond 
       (nil? current) sorted 
       (or (nil? (seq current)) (= (count current) 1)) (recur todo (concat sorted current))
       :else (let [[pivot & rest] current
                  pred #(> (compare pivot %) 0)
                  lt (filter pred rest)
                  gte (remove pred rest)
                  work (list* lt [pivot] gte todo)] 
                (recur work sorted)))))

I used in this way:

(defn tlfnum [] (str/join (repeatedly 10 #(rand-int 10))))
(defn tlfbook [n] (repeatedly n #(tlfnum)))
(time (count (qsort (tlfbook 10000))))

And this is part of the stack trace:

  [clojure.lang.LazySeq seq "LazySeq.java" 49]
  [clojure.lang.RT seq "RT.java" 521]
  [clojure.core$seq__4357 invokeStatic "core.clj" 137]
  [clojure.core$concat$fn__4446 invoke "core.clj" 706]
  [clojure.lang.LazySeq sval "LazySeq.java" 40]
  [clojure.lang.LazySeq seq "LazySeq.java" 49]
  [clojure.lang.RT seq "RT.java" 521]
  [clojure.core$seq__4357 invokeStatic "core.clj" 137]]}

As far as I know, loop/recur performs tail call optimization, so no stack is used (is, in fact, an iterative process written using recursive syntax).

Reading other answers, and because of the stack trace, I see there's a problem with concat and adding a doall before concat solves the stack overflow problem. But... why?

like image 462
danigb Avatar asked Dec 10 '22 11:12

danigb


1 Answers

Here's part of the code for the two-arity version of concat.

(defn concat [x y]
  (lazy-seq
   (let [s (seq x)]
     ,,,))
  )

Notice that it uses two other functions, lazy-seq, and seq. lazy-seq is a bit like a lambda, it wraps some code without executing it yet. The code inside the lazy-seq block has to result in some kind of sequence value. When you call any sequence operation on the lazy-seq, then it will first evaluate the code ("realize" the lazy seq), and then perform the operation on the result.

(def lz (lazy-seq
         (println "Realizing!")
         '(1 2 3)))

(first lz)
;; prints "realizing"
;; => 1

Now try this:

(defn lazy-conj [xs x]
  (lazy-seq
   (println "Realizing" x)
   (conj (seq xs) x)))

Notice that it's similar to concat, it calls seq on its first argument, and returns a lazy-seq

(def up-to-hundred
  (reduce lazy-conj () (range 100)))

(first up-to-hundred)
;; prints "Realizing 99"
;; prints "Realizing 98"
;; prints "Realizing 97"
;; ...
;; => 99

Even though you asked for only the first element, it still ended up realizing the whole sequence. That's because realizing the outer "layer" results in calling seq on the next "layer", which realizes another lazy-seq, which again calls seq, etc. So it's a chain reaction that realizes everything, and each step consumes a stack frame.

(def up-to-ten-thousand
  (reduce lazy-conj () (range 10000)))

(first up-to-ten-thousand)
;;=> java.lang.StackOverflowError

You get the same problem when stacking concat calls. That's why for instance (reduce concat ,,,) is always a smell, instead you can use (apply concat ,,,) or (into () cat ,,,).

Other lazy operators like filter and map can exhibit the exact same problem. If you really have a lot of transformation steps over a sequence consider using transducers instead.

;; without transducers: many intermediate lazy seqs and deep call stacks
(->> my-seq
     (map foo)
     (filter bar)
     (map baz)
     ,,,)


;; with transducers: seq processed in a single pass
(sequence (comp
           (map foo)
           (filter bar)
           (map baz))
          my-seq)
like image 80
Arne Brasseur Avatar answered Jan 04 '23 20:01

Arne Brasseur