Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beginner question about heap and garbage in Clojure

I have a question about Clojure: I am trying to learn the language by going through Project Euler and I don't understand what is going on under the hood: The following code is designed to use return a list of all prime numbers up to lim. I would think that it should be O(n) in heap-space because I make a list of all the numbers up to lim, and then filter them away one by one while moving the first to a new list. (I know that I am actually making new lists each recur but I didn't think they would take more memory?) Anyway I am running this with

(defn getAllPrimes [lim] 
  (defn getPrimes [primes numlist]
    (if (not-empty numlist) ;base case;
    (recur (cons (first numlist) primes) ;put the prime on to the prime list 
           (filter
        (fn [x] (not (div? x (first numlist)))) ;remove the prime and all its multiples from the numlist
        (rest numlist)))
      primes)); return the primes
  (getPrimes () (range 2 lim))) ;call the recursive function with and empty prime list to be filled up and a full numlist to be emptied

And I keep running out of heap space, when I call

(apply + (getAllPrimes 2000000))

, but I don't run out of space on

(apply + (filter even? (range 2000000)))

So I think that I must not understand how lists are garbage collected in calls to recur and am actually using O(n*n) heap or something.

like image 372
Benjie Holson Avatar asked Aug 10 '10 23:08

Benjie Holson


People also ask

What part of memory stack or heap is cleaned in garbage collection process?

Whenever we create an object, it's always created in the Heap space. Garbage Collection runs on the heap memory to free the memory used by objects that don't have any reference.

Does Clojure have garbage collection?

> Keyword objects are interned by Clojure and are not garbage collected.

What happens if heap memory is full?

When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.

Why stack memory is faster than heap?

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or free.


1 Answers

I believe the problem is that with each recur, you're creating a new lazy sequence referring to the last one, so after a few iterations you're holding a seq that holds the head of a seq that holds the head of a seq that holds the head of a seq. ... All the intermediate seqs are filling up your heap.

Though writing a prime sieve is a worthwhile exercise, if you want to get to the answer, Clojure does include the sequence of prime numbers in its standard library: clojure.contrib.lazy-seqs/primes. The standard solution to this paricular Euler problem is a one-liner.

As a style point, an inner defn is not a good idea. The practical effect is the same as if the defn were at the top level, but if I'm not mistaken, the var gets reassigned every time getAllPrimes is called, and redefining vars at runtime is very strongly discouraged. Since the code is just defining a var, getPrimes is still just as visible as getAllPrimes. In this case, getPrimes could easily be rewritten as a loop/recur with no inner function, anonymous or named. That doesn't help your chain-of-lazy-seqs problem, but it does make the code a little more standard-looking:

(defn getAllPrimes [lim]
  (loop [primes ()
         numlist (range 2 lim)]
    (if (not-empty numlist)
      (recur (cons (first numlist) primes)
             (filter (fn [x] (not (div? x (first numlist))))
                     (rest numlist)))
      primes)))

I would also avoid the use of camelCase. The Clojure standard name for this function would be get-all-primes.

Getting back to the practical problem, though, the least work you could do to get your code working would be to force each seq on each iteration, i.e., wrap your filter call in a doall. I tried this, and while it still runs slowly, it at least does run to completion without running out of heap:

(defn get-all-primes [lim]
  (loop [primes ()
         numlist (range 2 lim)]
    (if (not-empty numlist)
      (recur (cons (first numlist) primes)
             (doall (filter #(not (div? % (first numlist)))
                            (rest numlist))))
      primes)))
like image 115
dreish Avatar answered Oct 03 '22 18:10

dreish