Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are persistent collections garbage collected?

Tags:

clojure

If I hold e reference to a part of a persistent collection, can the whole collection be garbage collected? Do I understand this correctly?

The function gctest is just to test the behavior of collections.

(defn gctest
  "A shot about testing the gc-ability of persistent thingies."
  [n]
  (take 5 (drop 100 (vec (range n)))))

main=> (def a (gctest 1e7))
main=> (def b (gctest 1e7))
main=> (def c (gctest 1e7))
main=> (def d (gctest 1e7))
main=> (def e (gctest 1e7))
main=> (def f (gctest 1e7))
main=> (def g (gctest 1e7))

OutOfMemoryError GC overhead limit exceeded  clojure.lang.ArrayChunk.dropFirst     (ArrayChunk.java:54)

I already heard about head retention, but this seems a bit more general or isn't it?

What I want to understand is how I can use big changing collection. I expect big parts of the collections change over time, in a way that big parts can be garbage collected in principle, but not all of them.

Is there a standard way to cope with this?

like image 892
Falko Avatar asked Jun 14 '12 16:06

Falko


People also ask

Which objects are garbage collected?

An object is eligible to be garbage collected if its reference variable is lost from the program during execution. Sometimes they are also called unreachable objects. What is reference of an object? The new operator dynamically allocates memory for an object and returns a reference to it.

What is the example of garbage collection?

The main objective of Garbage Collector is to free heap memory by destroying unreachable objects. The garbage collector is the best example of the Daemon thread as it is always running in the background.

Which memory is cleared in garbage collection?

Garbage collection makes Java memory efficient because it removes the unreferenced objects from heap memory and makes free space for new objects. The Java Virtual Machine has eight types of garbage collectors.

What is the complexity of garbage collection?

Garbage collection has to scan memory, thus its complexity must be a function of N, where N may be the number of bytes or the number of objects. It must be bound by a linear function of N, since ultimately it needs to scan all of these objects.

What is garbage collection services?

Garbage collection services refers to the collection and removal of waste that cannot be recycled or reused. For this exercise, waste disposal services (such as landfill site operations) were not included in calculations. Waste diversion programs (such as recycling) were also separated and the benchmarking results are presented here.

What is the purpose of garbage collection in programming?

They aren’t looking for things to delete, they know exactly what has to be removed. Garbage collection is a method to free the programmer from worrying about basic memory management. It isn’t the only option; simple reference counting also covers the vast majority of memory management needs.

Are garbage collection costs included in the cost of garbage collection?

Some cities have included the cost of waste disposal in their costs of garbage collection, whereas other cities have not. Waste disposal facilities are the single most expensive component of the waste program and these costs continue to increase as environmental regulations become more stringent.


2 Answers

Standard GC rule remains: as long as you keep the reference to a part of collection, all objects accessible from your references stay in memory. So only the part of collection that is accessible from your references will be hold, the rest will be collected. In particular, if you refer to the last 50 elements of 100 element list, first 50 elements will be collected and the rest will stay in memory.

However, in your case all elements of each collection starting from 100th will be kept. And the reason for it is lazy evaluation. Function take produces lazy sequence of (in your case) 5 elements. Lazy sequence object itself is not real sequence, instead it is special generator object (though it's not Clojure's, but rather Python's term). When you need element of lazy sequence, generator object generates and returns it. But if you do not ask for the element, generator just keeps references to all objects it may need to generate an element.

In your example you create large vector and ask for 5 elements from it, and then save result to variables a, b, c, etc. Clojure makes large vector and generator object, pointing to 100th element. Reference to collection itself is lost, but reference to generator object is saved on top level. You never evaluate generator objects and thus never make real 5 element sequences. REPL refers to vars a, b, c, etc., these vars refer to generator objects, and generator objects refer to the collections they need to produce real 5 element sequences. Thus all elements (except may be first 100 of them) of all collections have to stay in memory.

On other hand, if you evaluate generator objects, they will produce real 5 element sequences and forget reference to the rest of collection. Try this:

user> (def a (gctest 1e7))
#'user/a                                                                                                                                               
user> (println a)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def b (gctest 1e7))
#'user/b                                                                                                                                               
user> (println b)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def c (gctest 1e7))
#'user/c                                                                                                                                               
user> (println c)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def d (gctest 1e7))
#'user/d                                                                                                                                               
user> (println d)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def e (gctest 1e7))
#'user/e                                                                                                                                               
user> (println e)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def f (gctest 1e7))
#'user/f                                                                                                                                               
user> (println f)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def g (gctest 1e7))
#'user/g                                                                                                                                               
user> (println g)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def h (gctest 1e7))
#'user/h                                                                                                                                               
user> (println h)
(100 101 102 103 104)                                                                                                                                  
nil                                                                                                                                                    
user> (def i (gctest 1e7))
#'user/i                                                                                                                                               
user> (println i)
(100 101 102 103 104)                                                                                                                                  
nil 

There's no OutOfMemory! Vars a, b, c, etc. now store real lists of 5 elements, and thus there are no more references to large collections, so they can be collected.

like image 66
ffriend Avatar answered Oct 29 '22 01:10

ffriend


These kinds of problems can be solved using lazy sequences. In your case you have used vec function which actually create a vector in memory by going though each element generated by the range function (range returns a lazy sequence).

Below code (without vec call won't have the memory problem)

(defn gctest
  "A shot about testing the gc-ability of persistent thingies."
  [n]
  (take 5 (drop 100 (range n))))

UPDATE:

When you use the vec call, all the elements in the memory will be retained and not collected by GC because these elements are referenced (and required) by the sequence object returned from the function gctest so that it can fetch the required elements (i.e skipping 100 elements and taking 5 elements) when the sequence object is asked for elements.

like image 41
Ankur Avatar answered Oct 29 '22 01:10

Ankur