I am interested in writing a very minimalistic compiler.
I want to write a small piece of software (in C/C++) that fulfills the following criteria:
Language features:
Can anybody tell me how to start? I don't know what parts a compiler consists of (at least not in the sense that I just could start right off the shelf) and how to program them. Thank you for your ideas.
The language processor that reads the complete source program written in high-level language as a whole in one go and translates it into an equivalent program in machine language is called a Compiler. Example: C, C++, C#, Java.
A compiler is a special program that translates a programming language's source code into machine code, bytecode or another programming language. The source code is typically written in a high-level, human-readable language such as Java or C++.
A compiler is likely to perform some or all of the following operations, often called phases: preprocessing, lexical analysis, parsing, semantic analysis (syntax-directed translation), conversion of input programs to an intermediate representation, code optimization and code generation.
A compiler is a computer program that changes source code written in a high-level language into low-level machine language. It translates the code written in one programming language to some other language without modifying the definition of the code.
With all that you hope to accomplish, the most challenging requirement might be "very small (max. 1-2 KLOC)". I think your first requirement alone (generating ELF output) might take well over a thousand lines of code by itself.
One way to simplify the problem, at least to start with, is to generate code in assembly language text that you then feed into an existing assembler (nasm would be a good choice). The assembler would take care of generating the actual machine code, as well as all the ELF specific code required to build an actual runnable executable. Then your job is reduced to language parsing and assembly code generation. When your project matures to the point where you want to remove the dependency on an assembler, you can rewrite this part yourself and plug it in at any time.
If I were you, I might start with an assembler and build pieces on top of it. The simplest "compiler" might take a language with just a few very simple possible statements:
print "hello"
a = 5
print a
and translate that to assembly language. Once you get that working, then you can build a lexer and parser and abstract syntax tree and code generator, which are most of the parts you'll need for a modern block structured language.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With