I am currently working on a compiler under C and I am abit lost at the part where we construct the data structure for AST, especially for the part where we construct stucture for IDs, it is called "Symbol Table Entry"
I see structures over net such as:
struct ste {
struct id *name; /* pointer into hash table for assoc. id */
struct decl *decl; /* pointer into symbol table for its decl */
struct ste *prev; /* pointer to previous entry in symbol table */
};
It looks like a linked list as it holds a pointer to the previous entry (*prev) but what is the logic behind this ?
The answer to your concrete question is this: the prev link means that, when your code has a pointer to one of these node, it can follow the link to the previous link in the chain. One reason a symbol table might have a list like this is to deal with nested scope:
{
int x;
{
int x;
}
}
However, there are many, many other reasons why the symbols nodes might want to be arranged in a list. Any reason why the compiler needs to visit all the nodes is a reason.
You're seeing the leftovers of a pernicious habit from C programmers long ago: it is assumed that symbols will be on some lists, and instead of allocating list structures separately, the list pointers are included as part of the symbol structure. This trick saves one allocation per list element, but at a cost: the set of lists that a symbol can be on is fixed, and this structure confuses programmers. If the application is compilers, there is no reason to use this trick any more. It is much clearer to have a separate list structure that is defined something like this:
struct ste_list {
struct ste *symbol_table_entry;
struct str_list *next;
};
You can have as many of these as you like, and nobody is the wiser. And the internal pointers you find confusing go away.
You ask
what is the logic behind this?
Part of the answer is simply that it's useful to have symbols on a distinguished list. I can't answer the question definitively without knowing more about the particular compiler. My best guess is that the prev
entry is going to be used to implement nested scopes (the { ... }
brackets in C), but that's a guess based on compilers I've seen or worked on. So perhaps the logic is that when a closing brace is encountered, the compiler might follow that link until it gets to an ste
in an enclosing scope. People with a bit more experience than the author of the compiler you're studying will generally put this logic in a "symbol-table abstraction" which will include functions like enterscope()
and exitscope()
, and the details of these operations will be hidden from the internal representation of individual symbol-table entries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With