Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for storing variables in an interpreted language

I am designing my own experimental scripting language for the purpose of embedding it in my bigger application.

Almost everything I wanted to do was programmed smoothly, but the "simple" act of storing variables in memory appeared to be the hardest part here. I don't know how to store them to allow all type checking, global variables and special flags on them. First look at a sample code:

a = 1
b = 2

someFunction()
  print(a)   --> This should read the global variable and print `1`
  a = 3      --> Now `a` should become a local variable of this function
                 and the global `a` remain unchanged
  x = 4      --> `x` should always be local of this function
end

I call the "locality" of variables their levels so variables in nested blocks have a higher level. In the above code, a and b are level 1 variables. Local variables of someFunction will have level 2. The first line of the function should read the global variable a (level 1) but the second line should create a variable again called a but with level 2 that shadows the global a from that point onwards. The third line should create the variable x with level 2. How to store and keep track of all these in memory?

What I tried so far:

Method 1: Storing maps of variable=>value in array of levels:

variables
{
    level=1 //global variables
    {
        a => 1,
        b => 2
    },
    level=2 //function variables
    {
        a => 3,
        x => 4
    }
}

But that will make variable look-up really slow since one has to search all the levels for a given variable.

Method 2: Storing the (variable, level) pairs as keys of a map:

variables
{
    (a, 1) => 1, //global
    (b, 1) => 2, //global
    (a, 2) => 3, //function
    (x, 2) => 3  //function
}

This has the same problem as before since we have to try the pair (variable, level) with all possible levels for a given variable.

What method should I use for optimal memory usage and fastest access time?

Additional notes:

I know about how variables are managed on stack and heap on other "real" languages, but I find it tricky to do this on an interpreted language. "This mustn't be how Lua and Python do that," I always think. Correct me if I'm wrong. I'm trying to store the variable in maps and internal C++ structures.

And finally, this is how I represent a variable. Do you think it's big and there can be more memory-efficient representations? (I've also tried to put the "Level" as a member here but it had the same problem as the other too.)

struct Member
{
    uchar type;  //0=num, 1=str, 2=function, 3=array, etc
    uchar flags; //0x80 = read-only, 0x40 = write-only, etc
    union {
        long double value_num;
        char* value_str;
        int value_func;
        //etc
    };
};
like image 427
Hossein Avatar asked Feb 20 '12 10:02

Hossein


1 Answers

An easy thing to do, similar to your array, is to maintain a stack of maps. Each map contains the bindings for that scope. To bind a variable, add it to the top map; to look up a variable, start at the top of the stack and stop when you reach a map that contains a binding for that variable. Search takes a little bit, but starting from the top/end you only have to search until you find it — in most cases, this search will not be very long.

You can also make the stack implicit by encapsulating this logic in an Environment class that has local bindings and an inherited environment used for resolving unknown variables. Need to go into a new scope? Create a new environment with the current environment as its base, use it, then discard it when the scope is finished. The root/global environment can just have a null inherited environment. This is what I would probably do.

like image 155
Michael Ekstrand Avatar answered Oct 29 '22 01:10

Michael Ekstrand