Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB's Garbage Collector?

What is your mental model of it? How is it implemented? Which strengths and weaknesses does it have? MATLAB GC vs. Python GC?

I sometimes see strange performance bottlenecks when using MATLAB nested functions in otherwise innocuously looking code, I am sure it is because of GC. Garbage Collector is an important part of VM and Mathworks does not make it public.

My question is about MATLAB's own heap and GC! Not about handling of Java/COM objects / preventing "out of memory" errors / allocation of stack variables.

EDIT: the first response is actually the meta-answer "Why should I care?". I do care because GC manifests itself when implementing linked list or MVC pattern.

like image 666
Mikhail Poda Avatar asked Sep 18 '09 18:09

Mikhail Poda


People also ask

Is Matlab garbage collected?

Overview. Generated C++ code provides consistent garbage collection via the object destructors and the MATLAB® Runtime's internal memory manager optimizes to avoid heap fragmentation. If memory constraints are still present on your system, try preallocating arrays in MATLAB.

How does Java garbage collector work?

As long as an object is being referenced, the JVM considers it alive. Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory.

How do I clear all memory in Matlab?

To clear all variables from the current workspace, use clear or clearvars . To clear all global variables, use clear global or clearvars –global . To clear a particular class, use clear myClass . To clear a particular function or script, use clear functionName .


1 Answers

This is the list of facts I collected. Instead of GC the term memory (de)allocation seems to be more appropriate in this context.

My principal information source is the blog of Loren (especially its comments) and this article from MATLAB Digest.

Because of its orientation for numeric computing with possible large data sets, MATLAB does really good job on optimizing stack objects performance like using in-place operations on data and call-by-reference on function arguments. Also because of its orientation its memory model is fundamentally different from such OO languages as Java.

MATLAB had officially no user-defined heap memory until version 7 (in version 6 there was undocumented reference functionality in schema.m files). MATLAB 7 has heap both in form of nested functions (closures) and handle objects, their implementation share the same underpinnings. As a side note OO could be emulated with closures in MATLAB (interesting for pre-2008a).

Surprisingly it is possible to examine entire workspace of the enclosing function captured by function handle (closure), see function functions(fhandle) in MATLAB Help. It means that enclosing workspace is being frozen in memory. This is why cellfun/arrayfun are sometimes very slow when used inside nested functions.

There are also interesting posts by Loren and Brad Phelan on object cleanup.

The most interesting fact about heap deallocation in MATLAB is, in my opinion, that MATLAB tries to do it each time the stack is being deallocated, i.e. on leaving every function. This has advantages but is also a huge CPU penalty if heap deallocation is slow. And it is actually very slow in MATLAB in some scenarios!

The performance problems of MATLAB memory deallocation that can hit code are pretty bad. I always notice that I unintentionally introduce a cyclic references in my code when it suddenly runs x20 slower and sometimes needs some seconds between leaving function and returning to its caller (time spent on cleanup). It is a known problem, see Dave Foti and this older forum post which code is used to make this picture visualizing performance (tests are made on different machines, so absolute timing comparison of different MATLAB versions is meaningless):

Linear increase of pool size for reference-objects means polynomial (or exponential) decrease of MATLAB performance! For value-objects the performance is, as expected, linear.

Considering these facts I can only speculate that MATLAB uses not yet very efficient form of reference counting for heap deallocation.

EDIT: I always encountered performance problem with many small nested functions but recently I noticed that at least with 2006a the cleanup of a single nested scope with some megabytes of data is also terrible, it takes 1.5 seconds just to set nested scope variable to empty!

EDIT 2: finally I got the answer - by Dave Foti himself. He acknowledges the flaws but says that MATLAB is going to retain its present deterministic cleanup approach.

Legend: Shorter execution time is better

R2006aR2008aR2009a

like image 94
Mikhail Poda Avatar answered Sep 23 '22 14:09

Mikhail Poda