Google's new language "Go" says on its website:
the language has been designed to be easy to analyze and can be parsed without a symbol table
I'm certainly no expert on these matters, but I thought a symbol table was a basic construct common to all compilers for languages that use variables, and Go clearly uses variables. What am I not understanding?
Parsing means just figuring out the program structure: separating the module into statements/declarations, breaking expressions down to sub-expressions, etc. You end up with a tree structure, known as a "parse tree", or "abstract syntax tree" (AST).
Apparently, C++ requires a symbol table to do parsing.
This page discusses some reasons why C++ requires a symbol table for parsing.
Of course, parsing is only a part of compilation, and you will need a symbol table to do a full compilation.
However, parsing itself can be useful in writing analysis tools (e.g. which module imports which modules). So, simplifying the parsing process means it's easier to write code analysis tools.
@Justice is right. To expand on that a little, in C the only actual tricky part is telling types apart from variables. Specifically when you see this:
T t;
You need to know that T
is a type for that to be a legal parse. That's something you have to look up in a symbol table. This is relatively simple to figure out as long as types are added to the symbol table as the parse continues. You don't need to do much extra work in the compiler: either T
is present in the table or it isn't.
In C++ things are much, much more complicated. There are enormous numbers of ambiguous or potentially ambiguous constructs. The most obvious is this one:
B::C (c);
Aside from the fact that it's not clear if B
is a class
, a typedef
, or a namespace
, it's also not clear if C
is a type and c
an object of that type, or if C
is a function (or constructor) taking c
as an argument (or even if C is an object with operator()
overloaded). You need the symbol table to carry on parsing, although it is still possible to continue quickly enough, as the type of the symbol is in the symbol table.
Things get much, much, much worse than that when templates come into the mix. If C (c)
is in a template, you might not know in the actual definition of the template, if C is a type or a function/object. That's because the template can declare C
to be either a type or a variable. What this means is that you need the symbol table, but you don't have one -- and you can't have one until the template is actually declared. Even worse, it's not necessarily sufficient to have just the type of the symbol: you can come up with situations which require the full information of the type the symbol represents, including size, alignment, and other machine-specific information.
All this has several practical effects. The two most significant I would say are:
Interpretation and compilation absolutely require symbol tables or similar. This is true for nearly all languages.
In C and C++, even parsing the language requires a symbol table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With