Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Learning how programming languages work

I've been programming for years (mainly Python), but I don't understand what happens behind the scenes when I compile or execute my code.

In the vein of a question I asked earlier about operating systems, I am looking for a gentle introduction to programming language engineering. I want to be able to define and understand the basics of terms like compiler, interpreter, native code, managed code, virtual machine, and so on. What would be a fun and interactive way to learn about this?

like image 423
RexE Avatar asked Oct 04 '09 08:10

RexE


People also ask

How do programming languages actually work?

Almost all programming languages work the same way: You write code to tell it what to do: print("Hello, world"). The code is compiled, which turns it into machine code the computer can understand. The computer executes the code, and writes Hello, world back to us.


4 Answers

Code to execution in a nutshell

A program (code) is fed into the compiler (or interpretor).

Characters are used to form tokens (+ , identifiers, numbers) and their value is stored in some thing called a symbol table.

These tokens are put together to form statements: (int a = 6 + b * c;). Mostly in the form of a syntax tree:

                     =
                    / \
                   /   \ 
                  a     +
                       / \
                      /   \
                     6     *
                          / \
                         b   c

Within an interpretor the tree is executed directly.

With a compiler, the tree is finally translated into either intermediate code or assembler code.

You now have one or more "object files". These contain the assembler code without the precise jumps (because these values are not known yet especially if the targets are in other object files). The object files are linked together with a linker which fills in the blanks for the jumps (ans references). The output of the linker is a library (which can be linked too) or an executable file.

If you start the executable, the program data is copied into memory and there is some other link jugling to match the pointers with the correct memory locations. And then control is given to the first instruction.

like image 179
Toon Krijthe Avatar answered Sep 30 '22 14:09

Toon Krijthe


In basic terms, you write source files. These are fancy text files, which are taken in by the compiler which outputs some form of executable code (what executes it depends on the type of code you're talking about). The compiler has several parts:

  • Some form of preprocessing on the file which handles macros and the like (like from C).
  • A parser, which takes in source files, verifies that they conform to the syntactic rules of your language, and transforms the file into an in-memory data structure that is more easily manipulable by other parts of the program. This is called an Abstract Syntax Tree or AST.
  • Some form of AST analysis, which verifies that the actual code you wrote does not violate any rules of the language (e.g. recursion in a language that does not support it), as well as many other things.
  • Optimization such as tail call optimization, loop optimization, and many other kinds of optimizations.
  • Code generation, which is the actual process of taking the final AST and any other generated data and turning it into a binary file of some sort that can be executed or interpreted.

Interpreter:

An interpreter is a program that takes in some form of binary data that represents a program not compiled to code directly executable by the target machine, and runs the commands within. Examples are python, java, and lua.

Native code:

This is code that has been compiled into native instructions directly executable by the target machine. For instance; if you run on an x86 architecture then c++ will compile to an executable file that is understandable by the processor.

Virtual Machine:

This is generally a program built to simulate the construction and operation of a processor. It may be as simple as a program that reads in bytecode and runs native language operations based on the commands the bytecode represents (though calling this a virtual machine may be a stretch), or it may be as complex as completely simulating the behavior of a processor and all associated peripherals.

those other answers have good points in them but this info and links ought to get you started. Any other questions, just ask!

(Most of this article was written with the help of wikipedia though some was written from memory)

like image 23
RCIX Avatar answered Sep 30 '22 14:09

RCIX


compilers, interpreters and virtual machines are just examples of implementation details. What you might look for is programming languages theory, generative grammar, language translators, and you need possibly some computer architecture to relate theory with implementations.

Personally, I learned from Sebesta's book. It gives a very wide introduction to the subject without going into minute details. It also, has a good chapter on the history of programming languages (~20 languages ~3 papers per language). It has nice explanation about grammars and theory of languages in general. Also, It gives a good introduction into Scheme, Prolog, and programming paradigms (Logic, Functional, Imperative^, Object oriented).

^ It concentrate a lot more on the imperative paradigm than the first two.

like image 45
Khaled Alshaya Avatar answered Sep 30 '22 14:09

Khaled Alshaya


This site has a great series of lectures on the Structure and Interpretation of Computer Programs, which is exactly the type of thing you are wanting to learn. The accompanying textbook is useful too, tho i havent personally read thru the whole thing. I think watching the lectures is pretty good, gets you about 60% of the way there.

like image 42
Chii Avatar answered Sep 30 '22 12:09

Chii