Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why certain block closure optimization is good and valid?

Tags:

smalltalk

In a very interesting post from 2001 Allen Wirfs-Brock explains how to implement block closures without reifying the (native) stack.

From the many ideas he exposes there is one that I don't quite understand and I thought it would be a good idea to ask it here. He says:

Any variable that can never be assigned during the lifetime of a block (e.g., arguments of enclosing methods and blocks) need not be placed in the environment if instead a copy of the variable is placed in the closure when it is created

There are two things I'm not sure I understand well enough:

  1. Why using two copies of the read-only variable is faster than having the variable moved to the environment? Is it because it would be faster for the enclosing context to access the (original) variable in the stack?
  2. How can we ensure that the two variables remain synchronized?

In question 1 there must be another reason. Otherwise I don't see the gain (when compared with the cost of implementing the optimization.)

For Question 2 take a non argument that is assigned in the method and not in the block. Why the oop stored in the stack would remain unchanged during the life of the block?

I think I know the answer to Q2: Because the execution of the block cannot be intertwined with the execution of the method, i.e., while the block lives, the enclosing context does not run. But isn't there any way to modify the stack temporary while the block is alive?

like image 229
Leandro Caniglia Avatar asked Jan 21 '15 21:01

Leandro Caniglia


1 Answers

Thanks to the comment of @aka.nice I found the answers to the two questions in Clement Bera's post, whose reading is both pleasant and clarifying.

For Q1 let's first say that Allen's remark means that the copy of the read-only variable can be placed in the block's stack, as if it were a local temporary of the block. The advantage of doing this only materializes if all variables defined outside the block and used inside it are never written in the block. Under these circumstances there would be no need to create the environment array and to emit any prolog or epilog to take care of it.

The machine code that accesses a stack variable is equivalent to the one required to access the environment one because the first would address the location using [ebp + offset] while the second would use [edi + offest], once edi has been set to point to the environment array (tempVector in Clement's notation.) So, there is no gain if some but not all of the environment variables are read-only.

The second question is also answered in Clement's excellent blog. Yes, there is another way to break the synchrony between the original variable and its copy in the block's stack: the debugger (as aka.nice would have told us!) If the programmer modifies the variable in the enclosing context, the debugger will need to detect the action and update the copy as well. Same if the programmer modifies the copy held in the block's stack.

I'm glad I decided to post the question here. The help I received from aka.nice and Clement Bera, plus the comments some people sent me by email helped a lot in augmenting my understanding.

One final remark. Wirfs-Brock claims that avoiding the reification of method contexts is mandatory. I tend to agree. However, many important operations on these data structures can be better implemented if the reification follows the lightweight pattern. More precisely, when debugging you can model these contexts with "viewers" that point to the native stack and use two indexes to delimit the portion that corresponds to the activation under analysis. This is both efficient and clean and the combination of both techniques leads to the best of the worlds because you can have speed and expressiveness at once. Smalltalk is amazing.

like image 127
Leandro Caniglia Avatar answered Oct 24 '22 09:10

Leandro Caniglia