I am working on a toy language that compiles to C++ based on lisp (very small subset of scheme), I am trying to figure out how to represent let expression,
(let ((var 10)
(test 12))
(+ 1 1)
var)
At first I thought execute all exprs then return the last one but returning will kill my ability to nest let expressions, what would be the way to go for representing let?
Also, any resources on source to source transformation is appriciated, I have googled but all I could fing is the 90 min scheme compiler.
One way to expand let
is to treat it as a lambda
:
((lambda (var test) (+ 1 1) var) 10 12)
Then, transform this to a function and a corresponding call in C++:
int lambda_1(int var, int test) {
1 + 1;
return var;
}
lambda_1(10, 12);
So in a larger context:
(display (let ((var 10)
(test 12))
(+ 1 1)
var))
becomes
display(lambda_1(10, 12));
There are a lot more details, such as needing to access lexical variables outside the let
from within the let
. Since C++ doesn't have lexically nested functions (unlike Pascal, for example), this will require additional implementation.
I'll try to explain a naive approach to compiling nested
lambda
s. Since Greg's explanation of expanding let
into a lambda
is very good, I won't address let
at all, I'll assume that let
is
a derived form or macro and is expanded into a lambda
form that is
called immediately.
Compiling Lisp or Scheme functions directly into C or C++ functions will be tricky due to the issues other posters raised. Depending on the approach, the resulting C or C++ won't be recognizeable (or even very readable).
I wrote a Lisp-to-C compiler after finishing Structure and Interpretation of Computer Programs (it's one of the final exercises, and actually I cheated and just wrote a translator from SICP byte code to C). The subset of C that it emitted didn't use C functions to handle Lisp functions at all. This is because the register machine language in chapter 5 of SICP is really lower level than C.
Assuming that you have some form of environments, which bind names to values, you can define the crux of function calling like this: extend the environment which the function was defined in with the formal parameters bound to the arguments, and then evaluate the body of the function in this new environment.
In SICP's compiler, the environment is held in a global variable, and there are other global variables holding the argument list for a function call, as well as the procedure object that is being called (the procedure object includes a pointer to the environment in which it was defined), and a label to jump to when a function returns.
Keep in mind that when you are compiling a lambda
expression, there
are two syntactic components you know at compile-time: the formal
parameters and the body of the lambda
.
When a function is compiled, the emitted code looks something like this pseudocode:
some-label
*env* = definition env of *proc*
*env* = extend [formals] *argl* *env*
result of compiling [body]
...
jump *continue*
... where *env*
and *argl*
are the global variables holding the
environment and argument list, and extend
is some function (this can
be a proper C++ function) that extends the environment *env*
by
pairing up names in *argl*
with values in [formals]
.
Then, when the compiled code is run, and there is a call to this
lambda
somewhere else in your code, the calling convention is to put
the result of evaluating the argument list into the *argl*
variable, put the return label in the *continue*
variable, and then jump to some-label
.
In the case of nested lambda
s, the emitted code would look something
like this:
some-label
*env* = definition env of *proc*
*env* = extend [formals-outer] *argl* *env*
another-label
*env* = definition env of *proc*
*env* = extend [formals-inner] *argl* *env*
result of compiling [body-inner]
...
jump *continue*
rest of result of compiling [body-outer]
... somewhere in here there might be a jump to another-label
jump *continue*
This is a bit difficult to explain, and I'm sure I've done a muddled job of it. I can't think of a decent example that doesn't involve me basically sloppily describing the whole chapter 5 of SICP. Since I spent the time to write this answer, I'm going to post it, but I'm very sorry if it's hopelessly confusing.
I strongly recommend SICP and Lisp in Small Pieces.
SICP covers metacircular interpretation for beginners, as well as a number of variants on the interpreter, and a byte code compiler which I have managed to obfuscate and mangle above. That's just the last two chapters, the first 3 chapters are just as good. It's a wonderful book. Absolutely read it if you haven't yet.
L.i.S.P includes a number of interpreters written in Scheme, a compiler to byte code and a compiler to C. I'm in the middle of it and can say with confidence it's a deep, rich book well worth the time of anyone interested in the implementation of Lisp. It may be a bit dated by this point, but for a beginner like me, it's still valuable. It's more advanced than SICP though, so beware. It incudes a chapter in the middle on denotational semantics which went basically right over my head.
Some other notes:
Darius Bacon's self-hosting Lisp to C compiler
lambda lifting, which is a more advanced technique that I think Marc Feeley uses
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With