Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About closure, LexicalEnvironment and GC

as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment(LE) and a VariableEnvironment(VE), for function code, these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment(ECMAScript v5 10.4.3), and all variables declared in function code are stored in the environment record componentof VariableEnvironment(ECMAScript v5 10.5), and this is the basic concept for closure.

What confused me is how Garbage Collect works with this closure approach, suppose I have code like:

function f1() {
    var o = LargeObject.fromSize('10MB');
    return function() {
        // here never uses o
        return 'Hello world';
    }
}
var f2 = f1();

after the line var f2 = f1(), our object graph would be:

global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o

so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o has at lease 1 refenrence and would never be GCed. Appearently this would result a waste of memory since o would never be used but is always stored in memory.

Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment, so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:

function f1() {
    var o1 = LargeObject.fromSize('10MB');
    var o2 = LargeObject.fromSize('10MB');
    return function() {
        alert(o1);
    }
}
var f2 = f1();

in this case, f2 uses the o1 object which stores in f1's VariableEnvironment, so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment, which result that o2 cannot be GCed as well, which further result in a waste of memory.

so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.

Thanks.

like image 282
otakustay Avatar asked Dec 29 '11 09:12

otakustay


2 Answers

tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced.". In your second example, o2 can be allocated on the stack and is thrown away after f1 exits.


I don't think they can handle it. At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:

function outer(node) {
    node.onclick = function inner() { 
        // some code not referencing "node"
    };
}

where inner closes over node, forming a circular reference inner -> outer's VariableContext -> node -> inner, which will never be freed in for instance IE6, even if the DOM node is removed from the document. Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem. But now I digress from the subject.

A common way to break the circular reference is to null out all unnecessary variables at the end of outer. I.e., set node = null. The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner?

I think the answer is no, but I can be proven wrong. The reason is that the following code executes just fine:

function get_inner_function() {
    var x = "very big object";
    var y = "another big object";
    return function inner(varName) {
        alert(eval(varName));
    };
}

func = get_inner_function();

func("x");
func("y");

See for yourself using this jsfiddle example. There are no references to either x or y inside inner, but they are still accessible using eval. (Amazingly, if you alias eval to something else, say myeval, and call myeval, you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.)


Edit: As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more. I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8. It also specifically touches on the eval problem. It seems that it has to parse the code in all inner functions. and see what variables are referenced, or if eval is used, and then determine whether each variable should be allocated on the heap or on the stack. Pretty neat. Here is another blog that contains a lot of details on the ECMAScript implementation.

This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap. E.g.:

function init(node) {

    var someLargeVariable = "...";

    function drawSomeWidget(x, y) {
        library.draw(x, y, someLargeVariable);
    }

    drawSomeWidget(1, 1);
    drawSomeWidget(101, 1);

    return function () {
        alert("hi!");
    };
}

Now, as init has finished its call, someLargeVariable is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget has been optimized away (inlined?). If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.


Answer to Raynos comment below. I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:

Screenshot of Chrome debugger When the inner function is being executed, someLargeVariable is still in scope.

If I comment out the reference to someLargeVariable in the inner drawSomeWidget method, then you get a different result:

Screenshot of Chrome debugger 2 Now someLargeVariable is not in scope, because it could be allocated on the stack.

like image 112
waxwing Avatar answered Sep 19 '22 06:09

waxwing


There is no standard specifications of implementation for GC, every engine have their own implementation. I know a little concept of v8, it has a very impressive garbage collector (stop-the-world, generational, accurate). As above example 2, the v8 engine has following step:

  1. create f1's VariableEnvironment object called f1.
  2. after created that object the V8 creates an initial hidden class of f1 called H1.
  3. indicate the point of f1 is to f2 in root level.
  4. create another hidden class H2, based on H1, then add information to H2 that describes the object as having one property, o1, store it at offset 0 in the f1 object.
  5. updates f1 point to H2 indicated f1 should used H2 instead of H1.
  6. creates another hidden class H3, based on H2, and add property, o2, store it at offset 1 in the f1 object.
  7. updates f1 point to H3.
  8. create anonymous VariableEnvironment object called a1.
  9. create an initial hidden class of a1 called A1.
  10. indicate a1 parent is f1.

On parse function literal, it create FunctionBody. Only parse FunctionBody when function was called.The following code indicate it not throw error while parser time

function p(){
  return function(){alert(a)}
}
p();

So at GC time H1, H2 will be swept, because no reference point that.In my mind if the code is lazily compiled, no way to indicate o1 variable declared in a1 is a reference to f1, It use JIT.

like image 27
Shihua Ma Avatar answered Sep 21 '22 06:09

Shihua Ma