Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Global variable in a package - which approach is more recommended?

Tags:

package

r

I do understand that generally global variables are evil and I should avoid them, but if my package does need to have a global variable, which of these two approaches are better? And are there any other recommended approaches?

  1. Using an environment visible to the package

    pkgEnv <- new.env()  
    pkgEnv$sessionId <- "xyz123"
    
  2. Using options

    options("pkgEnv.sessionId" = "xyz123")
    

I know there are some other threads that ask about how to achieve global variables, but I haven't seen a discussion on which one is recommended

like image 210
DeanAttali Avatar asked Dec 19 '22 07:12

DeanAttali


2 Answers

Some packages use hidden variables (variables that begin with a .), like .Random.seed and .Last.value do in base R. In your package you could do

e <- new.env()
assign(".sessionId", "xyz123", envir = e)
ls(e)
# character(0)
ls(e, all = TRUE)
# [1] ".sessionId"

But in your package you don't need to assign e. You can use a .onLoad() hook to assign the variable upon loading the package.

.onLoad <- function(libname, pkgname) {
    assign(".sessionId", "xyz123", envir = parent.env(environment()))
}

See this question and its answers for some good explanation on package variables.

like image 101
Rich Scriven Avatar answered Jan 11 '23 23:01

Rich Scriven


When most people say you should avoid 'global' variables, they mean you should not assign to the global environment (.GlobalEnv,GlobalEnv, or as.environment(1)) or that you should not pass information between internal functions by any method other than passing such data as the arguments of a function call.

Caching is another matter entirely. I often caching results that I don't want to re-calculate (memoization) or re-query. A pattern I use a lot in packages is the following:

myFunction <- local({

    cache <- list() # or numeric(0) or whatever

    function(x,y,z){ 

        # calculate the index of the answer 
        # (most of the time this is a trivial calculation, often the identity function)
        indx = answerIndex(x,y,z)

        # check if the answer is stored in the cache
        if(indx %in% names(cacheList))
            # if so, return the answer
            return(cacheList[indx])

        [otherwise, do lots of calculations or data queries]

        # store the answer
        cahceList[indx] <<- answer

        return(answer)
    }

})

The call to local creates a new environment where I can store results using the scoping assignment operator <<- without having to worry about the fact that the package was already sealed, and the last expression (the function definition) is returned as the value of the call to local() and is bound to the name myFunction.

like image 41
Jthorpe Avatar answered Jan 11 '23 22:01

Jthorpe