Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling local variables for a stack machine

I'm building a toy compiler from a C like language to a stack machine and I'm at the point where I need to figure out what to do with functions and block local variables. Thinking through it abstractly it looks like I have two options at opposite ends of the spectrum: 1) Preprocess and preallocate stack space for each variable, 2) Add special instructions to the VM to walk the stack.

Pre-process and preallocate stack space for each variable

This has the advantage of giving me all the addresses for the variables ahead of time so I don't have to be very clever or add any extra instructions to the VM for walking the stack. The downside is that it is potentially very wasteful because conditional code that never executes but declares a whole bunch of variables will take up a lot of unnecessary space. For example,

a : t1 = value;
if (test) {
  b : t2; c : t3; d : t4; ...;
}

In the above code even if test is always false I will still allocate space for all those variables within the conditional branch.

Add special instructions to the VM to walk the stack

The other approach I could come up with was to generate code for each variable declaration and then add some special VM instructions to figure out the address of those variables at runtime. This solves the problem of wasted stack space but then adds a computational overhead that I can potentially solve by some caching methods.

So what's the correct approach and is there another approach I didn't think of that is better?

like image 393
David K. Avatar asked Jul 19 '14 03:07

David K.


1 Answers

The idea of a stack machine is that it does computations on an operand stack. It doesn't mean everything has to be stored on a stack. This is a common misconception. Typically your local vaiables / block scoped access maps to a register operation.

.NET CLR and Java both have instructions to store and fetch "local" variables as well as other types of variables. I would suggest you follow suit, because you don't want to walk the stack for simple variable access. That is very inefficient. It should be efficient to load / store variables, like a register. Most stack machines still have random access storage.

In the CLR, we also pre-allocate all of our local variables at the beginning of each method. The variables you preallocate may be a mix of explicit high level variables and compiler generated temporaries. But nothing says they have to be on a stack. On the VMs I've worked on we implemented them in a fast access area like an associative array or a vector like structure. I recommend that you use ildasm to disassemble a .NET method and take note of how the local variables are declared and handled.

Example:

 total = apples + oranges

Maps to:

 ldloc 'apples'   # load a local onto stack
 ldloc 'oranges'  # load a local onto stack
 add              # add 2 operands on stack
 stloc 'total'    # store local from stack

In a previous answer I gave an explanation of stack based machines and compared to register machines. I hope there is some useful information in it. https://stackoverflow.com/a/24301283/257090

It is fairly straightforward to implement a prototype of stfld (store field) and ldfld (load field) with a simple Dictionary or HashTable. Later, you can optimize the assemble or runtime to compile symbolic named based references to integer references, especially if there is no need for runtime lookup of variables by name. However, for reflection APIs, you will need to implement additional metadata to cross-reference the numerical or pointer addresses back to their original names.

PS: If I've misunderstood your question, or you want to discuss more, please comment.

like image 118
codenheim Avatar answered Sep 26 '22 18:09

codenheim