I am doing a benchmark for mergesort and quicksort.
I implemented Random_list.create
, Mergesort.sort_list
and Quicksort.sort_list
. We can assume the three functions are correctly implemented and the implementations are not important in this question.
What I want to ask is about OCaml's GC.
Here is my benchmark code:
let _ =
let l = Random_list.create 10000000 in
let len1 = List.length (Mergesort.sort_list l) in
Printf.printf "mergesort done for %d elements" len1;
let len2 = List.length (Quicksort.sort_list l) in
Printf.printf "quicksort done for %d elements" len2
If I run the code above, it tells me Fatal error: exception Out_of_memory
after mergesort done for 10000000 elements
.
It is out of memory, no problem. also the output tells me Out_of_memory
occurred after successful mergesort
.
Then what I did is split the code and test separately:
let _ =
let l = Random_list.create 10000000 in
let len1 = List.length (Mergesort.sort_list l) in
Printf.printf "mergesort done for %d elements" len1
and then
let _ =
let l = Random_list.create 10000000 in
let len2 = List.length (Quicksort.sort_list l) in
Printf.printf "quicksort done for %d elements" len2
Both runs fine without Out_of_memory
.
Here is my question:
From my benchmark code, Yes, I did serial sorts: mergesort and then quicksort.
During the execution, there should be 3 major lists created: l
and the list from mergesort and the list from quicksort.
However, the list created from mergesort should be GCed
before quicksort, right? and that list doesn't have any reference to it, right?
Before quicksort, there is only one major list which the original l
, right?
Why it still gives Out_of_memory
error?
I think the problem resides in the fact that you are using very large lists. The garbage collector keeps two different heap to manage memory :
The minor heap is cleared regularly, and if an object is alive long enough, it is promoted to the major heap.
However, really large objects go straight to the major heap. The thing is that the major heap requires to stop the world, i.e. halt the application. Therefore, major heap collection is done in several steps to ensure the application is not stopped for a long time, and is also not done quite as often as a minor heap collection.
Maybe in your case the merge_sort list is still not collected when you start the quick sort, and therefore all 3 lists are present in the memory at the same time.
You can ask the GC to do a full major collection to see if it fix the problem :
let _ =
let l = Random_list.create 10000000 in
let len1 = List.length (Mergesort.sort_list l) in
Printf.printf "mergesort done for %d elements" len1;
Gc.full_major ();
let len2 = List.length (Quicksort.sort_list l) in
Printf.printf "quicksort done for %d elements" len2
It is difficult to conclude without seeing the code, but could it be that in the first program, there are pointers to Mergesort.sort_list l
and Quicksort.sort_list l
in the stack frame, preventing the first list from being garbage collected, while in the second case the stack frame is deallocated between the two let _ = ...
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With