Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does local() differ from other approaches to closure in R?

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,

example <- local({
  hidden.x <- "You can't see me!"
  hidden.fn <- function(){
    cat("\"hidden.fn()\"")
  }
  function(){
    cat("You can see and call example()\n")
    cat("but you can't see hidden.x\n")
    cat("and you can't call ")
    hidden.fn()
    cat("\n")
  }
})

which behaves as follows from the command prompt:

> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x                 
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"

I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.

What the pros and cons of these two methods?

like image 971
David Lovell Avatar asked Oct 26 '11 11:10

David Lovell


People also ask

What does closure mean in R?

A closure in R is an object that contains functions bound to the environment the closure was created in. These functions maintain access to the scope in which they were defined, allowing for powerful design patterns that are difficult with the standard S3/S4 approach to objects in R.

What is closure data type in R?

Closures are used for creating functions within a function in the R environment. These are useful when the functions are changing/ repeating in the code with the same dataset.

How do I create a local environment in R?

To create an environment manually, use new. env() . You can list the bindings in the environment's frame with ls() and see its parent with parent. env() .

Why are closures useful?

Closures are useful because they let you associate data (the lexical environment) with a function that operates on that data. This has obvious parallels to object-oriented programming, where objects allow you to associate data (the object's properties) with one or more methods.


2 Answers

local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.

getMPIcluster <- NULL
setMPIcluster <- NULL
local({
    cl <- NULL
    getMPIcluster <<- function() cl
    setMPIcluster <<- function(new) cl <<- new
})

local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.

Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.

As @otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler

library(XML)
crawler <- local({
    seen <- new.env(parent=emptyenv())
    .do_crawl <- function(url, base, pattern) {
        if (!exists(url, seen)) {
            message(url)
            xml <- htmlTreeParse(url, useInternal=TRUE)
            hrefs <- unlist(getNodeSet(xml, "//a/@href"))
            urls <-
                sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
            seen[[url]] <- length(urls)
            for (url in urls)
                .do_crawl(url, base, pattern)
        }
    }
    .do_report <- function(url) {
        urls <- as.list(seen)
        data.frame(Url=names(urls), Links=unlist(unname(urls)),
                   stringsAsFactors=FALSE)
    }
    list(crawl=function(base, pattern="^/.*html$") {
        .do_crawl(base, base, pattern)
    }, report=.do_report)
})

crawler$crawl(favorite_url)
dim(crawler$report())

(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

like image 20
Martin Morgan Avatar answered Oct 01 '22 19:10

Martin Morgan


Encapsulation

The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.

Note that we actually can access the hidden function although its not that easy:

# run hidden.fn
environment(example)$hidden.fn()

Object Oriented Programming

Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:

library(proto)
p <- proto(x = "x", 
  fn = function(.) cat(' "fn()"\n '),
  example = function(.) .$fn()
)
p$example() # prints "fn()"

proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()

EDIT:

The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.

ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch
like image 175
G. Grothendieck Avatar answered Oct 01 '22 21:10

G. Grothendieck