I find myself attached to a project to integerate an interpreter into an existing application. The language to be interpreted is a derivative of Lisp, with application-specific builtins. Individual 'programs' will be run batch-style in the application.
I'm surprised that over the years I've written a couple of compilers, and several data-language translators/parsers, but I've never actually written an interpreter before. The prototype is pretty far along, implemented as a syntax tree walker, in C++. I can probably influence the architecture beyond the prototype, but not the implementation language (C++). So, constraints:
What I'm really looking for is reading material on the fundamentals of implementing interpreters. I did some browsing of SO, and another site known as Lambda the Ultimate, though they are more oriented toward programming language theory.
Some of the tidbits I've gathered so far:
Lisp in Small Pieces, by Christian Queinnec. The person recommending it said it "goes from the trivial interpreter to more advanced techniques and finishes presenting bytecode and 'Scheme to C' compilers."
NekoVM. As I've mentioned above, I doubt that we'd be allowed to incorporate an entire VM framework to support this project.
Structure and Interpretation of Computer Programs. Originally I suggested that this might be overkill, but having worked through a healthy chunk, I agree with @JBF. Very informative, and mind-expanding.
On Lisp by Paul Graham. I've read this, and while it is an informative introduction to Lisp principles, is not enough to jump-start constructing an interpreter.
Parrot Implementation. This seems like a fun read. Not sure it will provide me with the fundamentals.
Scheme from Scratch. Peter Michaux is attacking various implementations of Scheme, from a quick-and-dirty Scheme interpreter written in C (for use as a bootstrap in later projects) to compiled Scheme code. Very interesting so far.
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, recommended in the comment thread for Books On Creating Interpreted Languages. The book contains two chapters devoted to the practice of building interpreters, so I'm adding it to my reading queue.
So how about it? Is there a good book that takes the neophyte by the hand and shows how to build an interpreter in C/C++ for a Lisp-like language? Do you have a preference for syntax-tree walkers or bytecode interpreters?
To answer @JBF:
the current prototype is an interpreter, and it makes sense to me as we're accepting a path to an arbitrary code file and executing it in our application environment. The builtins are used to affect our in-memory data representation.
it should not be hideously slow. The current tree walker seems acceptable.
The language is based on Lisp, but is not Lisp, so no standards compliance required.
To the other posters, I'll be checking out your citations as well. Thanks, all!
To create an interpreter first you need to create a lexer to get the tokens of your input program. Next you create a parser that takes those tokens and, by following the rules of a formal grammar, returns an AST of your input program. Finally, the interpreter takes that AST and interprets it in some way.
Interpreter translates just one statement of the program at a time into machine code. Compiler scans the entire program and translates the whole of it into machine code at once. An interpreter takes very less time to analyze the source code. However, the overall time to execute the process is much slower.
Bytecode interpreters This "compiled" code is then interpreted by a bytecode interpreter (itself written in C). The compiled code in this case is machine code for a virtual machine, which is implemented not in hardware, but in the bytecode interpreter.
C, C++, C#, etc are programming languages that are compiler-based. Python, Ruby, Perl, SNOBOL, MATLAB, etc are programming languages that are interpreter-based.
Short answer:
The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).
Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.
See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .
A slightly longer answer:
It is hard to give advice without knowing what you require from your interpreter.
If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .
If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.
For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .
There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.
If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.
John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With