Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About lexical scoping in R

I am fairly new to R and while I was reading the manuals I came across a passage about lexical scoping along with this code example:

 open.account <- function(total) {
   list(
     deposit = function(amount) {
       if(amount <= 0)
         stop("Deposits must be positive!\n")
       total <<- total + amount
       cat(amount, "deposited.  Your balance is", total, "\n\n")
     },
     withdraw = function(amount) {
       if(amount > total)
         stop("You don't have that much money!\n")
       total <<- total - amount
       cat(amount, "withdrawn.  Your balance is", total, "\n\n")
     },
     balance = function() {
       cat("Your balance is", total, "\n\n")
     }
   )
 }

 ross <- open.account(100)
 robert <- open.account(200)

 ross$withdraw(30)
 ross$balance()
 robert$balance()

 ross$deposit(50)
 ross$balance()
 ross$withdraw(500)

So, I understand what the above code does, I guess I'm still confused about exactly how it works. If you can still access a function's "local" variables after the function has finished executing, isn't it very hard or impossible to predict when a variable is no longer needed? In the code above, if it were used as part of a larger program, would "total" be kept stored in memory until the entire program was done?(Essentially becoming a global variable memory-wise) If this is true, wouldn't this cause memory use issues?

I've looked at two other questions on this site: "How is Lexical Scoping implemented?" and "Why are lexical scopes prefered by the compilers?". The answers there went right over my head but it made me wonder: If(as I am guessing) the compiler isn't just making all variables global(memory-wise) and is instead using some technique to predict when certain variables won't be needed anymore and can be deleted, wouldn't doing this work actually make things harder on the compiler rather than easier?

I know that was alot of different questions but any help would be nice, thanks.

like image 619
Katana Avatar asked Jun 28 '13 15:06

Katana


1 Answers

OP seems to be looking for clarification about environments.

In R, every function[1] has an enclosing environment. This is the collection of objects that it knows about, in addition to those that are passed in as its arguments, or that it creates in its code.

When you create a function at the prompt, its environment is the global environment. This is just the collection of objects in your workspace, which you can see by typing ls(). For example, if your workspace contains a data frame Df, you could create a function like the following:

showDfRows <- function()
{
    cat("The number of rows in Df is: ", nrow(Df, "\n")
    return(NULL)
}

Your function knows about Df even though you didn't pass it in as an argument; it exists in the funtion's environment. Environments can be nested, which is how things like package namespaces work. You can, for example do lm(y ~ x, data=Df) to fit a regression, even though your workspace doesn't contain any object called lm. This is because the global environment's chain of parents includes the stats package, which is where the lm function lives.[2]

When functions are created inside another function, their enclosing environment is the evaluation frame of their parent function. This means that the child function can access all the objects known to the parent. For example:

f <- function(x)
{
    g <- function()
    {
        cat("The value of x is ", x, "\n")
    }
    return(NULL)
}

Notice that g doesn't contain any object called x, nor are any of its arguments named x. However, it all still works, because it will retrieve x from the evaluation frame of its parent f.

This is the trick that the code up above is using. When you run open_account, it creates an evaluation frame in which to execute its code. open_account then creates 3 functions, deposit, withdraw and balance. Each of these 3 has as its enclosing environment the evaluation frame of open_account. In this evaluation frame there is a variable called total, whose value was passed in by you, and which will be manipulated by deposit, withdraw and balance.

When open_account completes, it returns a list. If this was a regular function, its evaluation frame would now be disposed of by R. In this case, however, R can see that the returned list contains functions that need to use that evaluation frame; so the frame continues to stay in existence.

So, why don't Ross' and Robert's accounts clash with each other? Every time you execute open_account, R creates a new evaluation frame. The frames from opening Ross' and Robert's accounts are completely separate, just like, if you run lm(y ~ x, data=Df), there will be a separate frame to if you run lm(y ~ x, data=Df2). Each time open_account returns, it will bring with it a new environment in which to store the balance just created. (It will also contain new copies of the deposit, withdraw and balance functions, but generally we can afford to ignore the memory used for this.)

[1] technically every closure, but let's not muddy things

[2] again, there's a technical distinction between namespaces and environments but it isn't important here

like image 81
Hong Ooi Avatar answered Sep 26 '22 23:09

Hong Ooi