Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do compilers only look backward for type and function declarations?

Tags:

c++

c

parsing

This is purely to satisfy my own curiosity, but why are functions and types only resolved against those defined earlier in the code, rather than anywhere within the same scope?

Sometimes it shows up when functions need to call each other:

void foo(int depth) { bar(depth + 2); }
void bar(int depth) { if (depth > 0) foo(depth - 3); }

Where you either need to move bar to before foo, or declare bar beforehand:

void bar(int);
void foo(int depth) { bar(depth + 2); }
void bar(int depth) { if (depth > 0) foo(depth - 3); }

In more complex examples you may need to review the #include tree and #ifndef guards to figure out why your code does not recognize a function or type defined in some other file, and that's if you already know you're supposed to check for that.


And then of course there's this classic:

typedef struct {
    Node *next;
} Node;

Where you need to know that this is possible:

typedef struct Node {
    struct Node *next;
} Node;

...but apparently many people don't so they just use 'void *' or 'struct Node' everywhere.


So, is there a reason why these compilers don't resolve forward too? I can understand the preprocessor only checking backward since things can be #undefined, but once a type or function is declared it's there for good and can't be overloaded by a later definition.

Is it for historical reasons, with the limited technology that was available when they were first developed? Or is there some logical ambiguity I'm missing?

like image 232
BonzaiThePenguin Avatar asked Apr 21 '14 22:04

BonzaiThePenguin


2 Answers

The answer to your question is simply that "It would make it much harder to write a compiler if it didn't" - the language specification says that it has to be this way, but the reason for this wording in the language specification is that "it makes it easier to write a compiler that way". To some degree, it's probably also that in the old days, compilers would generate code "as they went along", and not read all the code in a translation unit (source file) first, then process it.

Bear in mind that C and C++ compilers are still being used on machines that don't have huge amounts of memory and very fast processors. So, if you try to compile LARGE amounts of code on a small capacity machine, then the "we don't want to read ALL of the source first, then prase it" approach makes more sense than on a 16GB, quad-core desktop machine. I expect you could load the entire source code for a fairly large project into memory all at once (for example, all of the files in LLVM + Clang is around 450MB, so could easily fit in memory on a modern desktop/laptop).

Edit: It should be noted that "interpreted languages", such as PHP, JavaScript and Basic, typically don't have this requirement, but other compiled languages typically do - for example Pascal has a special keyword forward to tell the compiler that this function exists, but I'm telling you what it contains later.

Both Pascal and C (and C++ because it's based on C in this aspect) allow for pointers to structures that aren't complete yet. Just this simple "you don't have all the information yet" means that the compiler has to build up the type information, and then "go back and fix it up" [obviously only as required]. But it allows us to do:

struct X
{
   struct X* next;
   ... 
};

or in C++:

struct X
{
   X* next; 
   ...
};

Edit2: This blog by Jan Hubicka, a GCC developer, explains some of the problems with "dealing with all the code at once". Of course, most of us aren't compiling Firefox and similar size projects, but large projects, when you have to deal with ALL of the code at once, do cause problems with "not enough memory available" even with modern machines if the developers don't "put the compiler on a diet from time to time".

http://hubicka.blogspot.co.uk/2014/04/linktime-optimization-in-gcc-1-brief.html

like image 128
Mats Petersson Avatar answered Oct 19 '22 07:10

Mats Petersson


The reason is that all necessary information of an entity being present when needed allows the compiler to translate a source file in a single pass. Cf. http://blogs.msdn.com/b/ericlippert/archive/2010/02/04/how-many-passes.aspx, for a comparison with C# (which doesn't require previous declarations; but then everything is in a class anyway).

C shares this feature with Pascal and a few other languages. Writing such a compiler is easier and, as others pointed out, tends to use less memory (but potentially increases compilation time, paradoxically, because the declarations in headers are being parsed/compiled over and over again for each translation unit).

like image 28
Peter - Reinstate Monica Avatar answered Oct 19 '22 06:10

Peter - Reinstate Monica