Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Symbol table for C++ code contain function names along with class names?

I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge. Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

like image 264
user2181750 Avatar asked Aug 14 '15 13:08

user2181750


1 Answers

Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.

We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.

What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.

OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.

Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.

C++ Symbol Table: Class Perspective

OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below. The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).

Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).

C++ Symbol Table: Function Perspective

As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way. However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)

As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.

While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).

Update March 2017: Now have symbol tables for C++2014. Update June 2018: Now have symbol tables for C++2017.

like image 89
Ira Baxter Avatar answered Sep 18 '22 22:09

Ira Baxter