Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple, formally defined language for compiler learning

I'm looking for a simple, formally defined language that can be used while learning about compiler construction. It should be simple to implement a first pass and then be amenable to further optimization efforts.

Feel free to point me in the direction of lisps, but I'm specifically looking for other options as well.

like image 305
Yehuda Katz Avatar asked Feb 09 '11 10:02

Yehuda Katz


People also ask

What is the easiest compiled language to learn?

Java. Java is also one of the easiest programming languages to learn. There are tons of online resources to learn Java, one of the easiest and best programming languages to learn. Java is many developers' first exposure to the principles of Object-Oriented design.

What language is used for compilers?

Compilers analyze and convert source code written in languages such as Java, C++, C# or Swift. They're commonly used to generate machine code or bytecode that can be executed by the target host system. Interpreters do not generate IR code or save generated machine code.

What is formal language in compiler design?

Formal language is a set of words ,i.e. finite strings of letters,symbols or tokens. In computer science they are used for the precise definition of data formats and the syntax of programming languages.

How do you define a formal language?

Definition of 'formal language' 1. a language designed for use in situations in which natural language is unsuitable, as for example in mathematics, logic, or computer programming. The symbols and formulas of such languages stand in precisely specified syntactic and semantic relations to one another.


4 Answers

May I suggest the Jack programming language from http://www.nand2tetris.org/?

It's especially suited for learning compiler construction, as it's part of an academic course.

I am in the midst of a blog post series on writing a compiler for this language, in C#, with code-generation to C. The posts I have already are here: http://blogs.microsoft.co.il/blogs/sasha/archive/tags/Compiler/default.aspx

like image 153
Sasha Goldshtein Avatar answered Sep 22 '22 12:09

Sasha Goldshtein


I thought Chapter 8 of Kernighan and Pike's The Unix Programming Environment was excellent. It covers much of programming in the Unix environment, all while implementing a programming language.

Chapter 8 is called Program Development. It discusses the development of a non-trivial program through various stages of design. That non-trivial program is hoc, the High-Order Calculator. For more details on hoc, see http://en.wikipedia.org/wiki/Hoc_(programming_language)

It is a great, practical introduction to implementing a simple language using the standard tools yacc and lex. yacc and lex are too much to cover here, but by following the examples in this book and doing the exercises you will develop an understanding of them.

The development lasts through various phases; in the first phase you don't even have variables in the language. By the third stage you have variables, defined constants (PI, E, etc.), and built-in functions like sin() and log(). By the last stage you have the fully implemented language.

Now, is hoc the best language to try and implement? I have no clue, but I know that The Unix Programming Environment was an excellent book to read in parallel with a traditional compiler book. As I began reading the Aho compiler book (dragon book) I re-read chapter 8 of TUPE and followed the examples and exercises. Sure, anyone can re-type the code from the book, but the exercises require that you have a good understanding of what is going on.

In the end I don't think it matters exactly which language you choose to do, but the process you follow while implementing it.

like image 26
Mr. Shickadance Avatar answered Sep 18 '22 12:09

Mr. Shickadance


I would suggest Wirth's PL/0.

Why?

  • The grammar is small, but still there is enough there to get a good taste for developing a compiler:

    program =
        block "." .
    
    block =
        [ "const" ident "=" number {"," ident "=" number} ";"]
            [ "var" ident {"," ident} ";"]
            { "procedure" ident ";" block ";" } statement .
    
    statement =
        [ ident ":=" expression
        | "call" ident
        | "begin" statement {";" statement } "end"
        | "if" condition "then" statement
        | "while" condition "do" statement
        ].
    
    condition =
        "odd" expression
        | expression ("="|"#"|"<"|"<="|">"|">=") expression
        .
    
    expression =
        [ "+"|"-"] term { ("+"|"-") term} .
    
    term =
        factor {("*"|"/") factor} .
    
    factor =
        ident | number | "(" expression ")" .
    
  • You can implement a Virtual Machine compiler for PL/0 in C in about 1000 lines of code.

    • Big enough to be non-trivial, but small enough to be doable.
  • There are three books related to it:

    • Wirth, Niklaus (1975), Algorithms + Data Structures = Programs, ISBN 0-13-022418-9 (the original PL/0 specification and implementation (in Pascal)) A great gentle introduction to compiling.

    • Liffick, Blaise W., Ed (1979), The Byte Book of Pascal, ISBN 0-07-037823-1 (the authors develop a minor superset of PL/0, in Northstar Basic for an early CP/M computer).

    • Wirth, Niklaus (1986), Compilerbau, B.G. Teubner, Stuttgart ISBN 3-519-32338-9 (a minor superset of PL/0, implemented in Modula 2. In German).

  • The Web is full of examples.

    • I've found implementations in C, C++, Pascal, Modula 2, Java and Ruby. I'll bet there are even more.
  • There is a Wikipedia entry: :-)

    • http://en.wikipedia.org/wiki/PL/0
  • In addition, a couple of helpful groups, with lots of people willing to help answer your compiler writing questions:

    • groups.yahoo.com/group/compilers101

    • groups.yahoo.com/group/QDepartment

    • Also, the Usenet newsgroup comp.compilers is a good place to get information.

like image 45
Sammy Mitchell Avatar answered Sep 20 '22 12:09

Sammy Mitchell


Oberon specification is small enough for your purposes: http://www-vs.informatik.uni-ulm.de:81/projekte/Oberon-2.Report/

R5RS or a pure functional subset of it is not that big too (if you ignore the numeric tower).

like image 38
SK-logic Avatar answered Sep 22 '22 12:09

SK-logic