Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiler-Programming: What are the most fundamental ingredients?

I am interested in writing a very minimalistic compiler.

I want to write a small piece of software (in C/C++) that fulfills the following criteria:

  • output in ELF format (*nix)
  • input is a single textfile
  • C-like grammar and syntax
  • no linker
  • no preprocessor
  • very small (max. 1-2 KLOC)

Language features:

  • native data types: char, int and floats
  • arrays (for all native data types)
  • variables
  • control structures (if-else)
  • functions
  • loops (would be nice)
  • simple algebra (div, add, sub, mul, boolean expressions, bit-shift, etc.)
  • inline asm (for system calls)

Can anybody tell me how to start? I don't know what parts a compiler consists of (at least not in the sense that I just could start right off the shelf) and how to program them. Thank you for your ideas.

like image 600
prinzdezibel Avatar asked Feb 17 '09 22:02

prinzdezibel


People also ask

What are the three examples of compiler?

The language processor that reads the complete source program written in high-level language as a whole in one go and translates it into an equivalent program in machine language is called a Compiler. Example: C, C++, C#, Java.

What is a compiler in programming?

A compiler is a special program that translates a programming language's source code into machine code, bytecode or another programming language. The source code is typically written in a high-level, human-readable language such as Java or C++.

What is compiler list major functions compiler?

A compiler is likely to perform some or all of the following operations, often called phases: preprocessing, lexical analysis, parsing, semantic analysis (syntax-directed translation), conversion of input programs to an intermediate representation, code optimization and code generation.

What is compiler and its types?

A compiler is a computer program that changes source code written in a high-level language into low-level machine language. It translates the code written in one programming language to some other language without modifying the definition of the code.


1 Answers

With all that you hope to accomplish, the most challenging requirement might be "very small (max. 1-2 KLOC)". I think your first requirement alone (generating ELF output) might take well over a thousand lines of code by itself.

One way to simplify the problem, at least to start with, is to generate code in assembly language text that you then feed into an existing assembler (nasm would be a good choice). The assembler would take care of generating the actual machine code, as well as all the ELF specific code required to build an actual runnable executable. Then your job is reduced to language parsing and assembly code generation. When your project matures to the point where you want to remove the dependency on an assembler, you can rewrite this part yourself and plug it in at any time.

If I were you, I might start with an assembler and build pieces on top of it. The simplest "compiler" might take a language with just a few very simple possible statements:

print "hello"
a = 5
print a

and translate that to assembly language. Once you get that working, then you can build a lexer and parser and abstract syntax tree and code generator, which are most of the parts you'll need for a modern block structured language.

Good luck!

like image 127
Greg Hewgill Avatar answered Sep 20 '22 08:09

Greg Hewgill