Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a symbol table relate to static chains and scoping?

I am taking a principles of programming languages course right now but I cannot for the life of me figure this out. This is not homework just a general concept question.

In our class we have talked about static chains and displays. I think that I understand why we need these. Otherwise when we have nested methods we cannot figure out what variable we are talking about when we have nested methods.

My prof has also talked about a symbol table. My question is what is the symbol table used for? How does it relate to the static chains?

I will give some background (please correct me if I am wrong).


(I am going to define a few things just to make explanations easier)

Suppose we have this code:

main(){
    int i;
    int j;
    int k;
    a(){
        int i;
        int j;
        innerA(){
            int i = 5;
            print(i);
            print(j);
            print(k);
        }
    }

    b(){
        ...
    }
    ...
}

And this stack:

| innerA  |
| a       |
| b       |
| main    |
-----------              

Quick description of static chains as a refresher.

Static chains are used to find which variable should be used when variables are redefined inside an inner function. In the stack shown above each frame will have a pointer to the method that contains it. So:

| innerA  | \\ pointer to a
| a       | \\ pointer to main
| b       | \\ pointer to main
| main    | \\ pointer to global variables
-----------        

(Assuming static scoping, for dynamic scoping I think that every stack frame will just point to the one below it)

I think that when we execute print(<something>) inside the innerA method this will happen:

currentStackframe = innerAStackFrame;
while(true){ 
    if(<something> is declared in currentStackFrame)
        print(<something>);
        break;
    else{
        currentStackFrame = currentStackFrame.containedIn();
    }
}

Quick refresher of symbol table

I am not really sure what a symbol table is for. But this is what it looks like:

Index is has value, 
Value is reference.
 __
|  |
|--|                        --------------------------------------------------
|  | --------------------> | link to next | name | type | scope level | other |
|--|                        --------------------------------------------------
|  |                              |
|--|                ---------------
|  |                |    
|--|                |             --------------------------------------------------
|  |                 ------->    | link to next | name | type | scope level | other |
|--|                              --------------------------------------------------
|  |
|--|
  • link to next - if more than one thing has the same has hash value this is a link
  • name - name of the element (examples: i, j, a, int)
  • type - what the thing is (examples: variable, function, parameter)
  • scope level - not 100% sure how this is defined. I think that:
    • 0 would be built-ins
    • 1 would be globals
    • 2 would be main method
    • 3 would be a and b
    • 4 would be innerA

Just to restate my questions:

  • What is the symbol table used for?
  • How does it relate to the static chains?
  • Why do we need static chains since the scope information is in the symbol table.
like image 328
sixtyfootersdude Avatar asked Aug 02 '10 17:08

sixtyfootersdude


2 Answers

Note that "symbol table" can mean two different things: it could mean the internal structure used by the compiler to determine which alias of a variable has scope where, or it could mean the list of symbols exported by a library to its users at load time. Here, you're using the former definition.

The symbol table is used to determine to which memory address a user is referring when the employ a certain name. When you say "x", which alias of "x" do you want?

The reason you need to keep both a static chain and a symbol table is this: when the compiler needs to determine which variables are visible in a certain scope, it needs to "unmask" the variables previously aliased in the inner scope. For instance, when moving from innerA back to a, the variable i changes its memory address. The same thing happens again going from a to main. If the compiler did not keep a static chain, it would have to traverse the whole symbol table. That's expensive if you've got lots of names. With static chains, the compiler just looks at the current level, removes the last definition of each variable contained in it, and then follows the link up one scope. If, on the other hand, you didn't have the symbol table, then every variable access not in the local scope would make the compiler have to walk the static chain.

Summing up, you can reconstruct the symbol table from the static chain, and vice versa. But you really want to have both to make the common-case operations fast. If you lack the symbol table, compiling will take longer because each non-locally-scoped variable access will require climbing the static chain. If you lack the static chain, compiling will take longer because leaving a scope will require walking the symbol table to remove now-defunct entries.

Incidentally, if you're not already using Michael Scott's Programming Language Pragmatics, you should take a look at it. It's by far the best textbook on this topic I've seen.

like image 187
Borealid Avatar answered Oct 07 '22 10:10

Borealid


This obviously refers to some specific class implementation, and in order to understand it I'd strongly recommend talking to somebody connected with the class.

The symbol table is what translates source code identifiers into something the compiler can use. It keeps the necessary descriptions. It tends to be used throughout the compilation process. The "type" you mention looks like it would be intended for parsing, and there would doubtless be more entries (in the "other") for later stages.

It's hard to know how it relates to the static chains, or why they're needed, since you don't even know what the "scope level" is. However, note that both a() and b() may have a variable i, you seem to think they've got the same scope level, so you need something to differentiate them.

Also, the static chain is frequently an optimization, so the compiler knows which symbol table entries to accept. Without a static chain, the compiler would have to do some lookups to reject an entry in b for something encountered in innerA.

To get anything more useful, you're going to have to explain more about what's going on (I'd strongly suggest talking to the instructor or TAs or whatever) and probably have more specific questions.

like image 22
David Thornley Avatar answered Oct 07 '22 08:10

David Thornley