Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What should I do to let GC free the unused memory in OCaml?

Tags:

ocaml

I am doing a benchmark for mergesort and quicksort.

I implemented Random_list.create, Mergesort.sort_list and Quicksort.sort_list. We can assume the three functions are correctly implemented and the implementations are not important in this question.

What I want to ask is about OCaml's GC.


Here is my benchmark code:

let _ = 
  let l = Random_list.create 10000000 in

  let len1 = List.length (Mergesort.sort_list l) in
  Printf.printf "mergesort done for %d elements" len1;

  let len2 = List.length (Quicksort.sort_list l) in
  Printf.printf "quicksort done for %d elements" len2

If I run the code above, it tells me Fatal error: exception Out_of_memory after mergesort done for 10000000 elements.

It is out of memory, no problem. also the output tells me Out_of_memory occurred after successful mergesort.


Then what I did is split the code and test separately:

let _ = 
      let l = Random_list.create 10000000 in
      let len1 = List.length (Mergesort.sort_list l) in
      Printf.printf "mergesort done for %d elements" len1

and then

let _ = 
      let l = Random_list.create 10000000 in
      let len2 = List.length (Quicksort.sort_list l) in
      Printf.printf "quicksort done for %d elements" len2

Both runs fine without Out_of_memory.


Here is my question:

From my benchmark code, Yes, I did serial sorts: mergesort and then quicksort.

During the execution, there should be 3 major lists created: l and the list from mergesort and the list from quicksort.

However, the list created from mergesort should be GCed before quicksort, right? and that list doesn't have any reference to it, right?

Before quicksort, there is only one major list which the original l, right?

Why it still gives Out_of_memory error?

like image 695
Jackson Tale Avatar asked Jul 18 '13 14:07

Jackson Tale


2 Answers

I think the problem resides in the fact that you are using very large lists. The garbage collector keeps two different heap to manage memory :

  • A minor heap for tiny/short-lived objects.
  • A major heap for long lasting objects.

The minor heap is cleared regularly, and if an object is alive long enough, it is promoted to the major heap.

However, really large objects go straight to the major heap. The thing is that the major heap requires to stop the world, i.e. halt the application. Therefore, major heap collection is done in several steps to ensure the application is not stopped for a long time, and is also not done quite as often as a minor heap collection.

Maybe in your case the merge_sort list is still not collected when you start the quick sort, and therefore all 3 lists are present in the memory at the same time.

You can ask the GC to do a full major collection to see if it fix the problem :

let _ = 
  let l = Random_list.create 10000000 in

  let len1 = List.length (Mergesort.sort_list l) in
  Printf.printf "mergesort done for %d elements" len1;

  Gc.full_major ();

  let len2 = List.length (Quicksort.sort_list l) in
  Printf.printf "quicksort done for %d elements" len2
like image 75
ChristopheLec Avatar answered Oct 28 '22 23:10

ChristopheLec


It is difficult to conclude without seeing the code, but could it be that in the first program, there are pointers to Mergesort.sort_list l and Quicksort.sort_list l in the stack frame, preventing the first list from being garbage collected, while in the second case the stack frame is deallocated between the two let _ = ...?

like image 36
David Monniaux Avatar answered Oct 28 '22 22:10

David Monniaux