Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: caching/memoise for environments

nI would like to use memoization to cache the results of certain expensive operations so that they are not computed over and over again.

Both memoise and R.cache fit my needs. However, I am finding that caching is not robust across calls.

Here is an example that demonstrates the problem I'm seeing:

library(memoise)

# Memoisation works: b() is called only once
a <- function(x) runif(1)
replicate(5, a())
b <- memoise(a)
replicate(5, b())

# Memoisation fails: mfn() is called every single time
ProtoTester <- proto(
  calc = function(.) {
    fn <- function() print(runif(1))
    mfn <- memoise(fn)
    invisible(mfn())
  }      
)
replicate(5, ProtoTester$calc())

Updated based on answer

This question can have different answers based on whether persistent or non-persistent caching is used. Non-persistent caching (such as memoise) may require single assignment and then the answer below is a nice way to go. Persistent caching (such as R.cache) works across sessions and should be robust with respect to multiple assignments. The approach above works with R.cache. Despite the multiple assignments, fn is only called once with R.cache. It would be called twice with memoise.

> ProtoTester <- proto(
+     calc = function(.) {
+         fn <- function() print(runif(1))
+         invisible(memoizedCall(fn))
+     }      
+ )
> replicate(5, ProtoTester$calc())
[1] 0.977563
[1] 0.1279641
[1] 0.01358866
[1] 0.9993092
[1] 0.3114813
[1] 0.97756303 0.12796408 0.01358866 0.99930922 0.31148128
> ProtoTester <- proto(
+     calc = function(.) {
+         fn <- function() print(runif(1))
+         invisible(memoizedCall(fn))
+     }      
+ )
> replicate(5, ProtoTester$calc())
[1] 0.97756303 0.12796408 0.01358866 0.99930922 0.31148128

The reason why I thought I had a problem with R.cache is that I was passing a proto method as the function to memoizedCall. proto methods are bound to environments in ways that R.cache has a hard time with. What you have to do in this case is unbind the function (get from an instantiated method to a simple function) and then pass the object manually as the first argument. The following example shows how this works (both Report and Report$loader are proto objects:

# This will not memoize the call
memoizedCall(Report$loader$download_report)

# This works as intended
memoizedCall(with(Report$loader, download_report), Report$loader)

I'd love to know why R.cache works with normal functions bound to environments but fails with proto instantiated methods.

like image 620
Sim Avatar asked Jul 06 '12 07:07

Sim


2 Answers

In your code, the function is memoized anew each time it is called. The following should work: it is only memoized once, when it is defined.

ProtoTester <- proto(
  calc = {
    fn <- function() print(runif(1))
    mfn <- memoise(fn)
    function(.) mfn()
  }
)
replicate(5, ProtoTester$calc())
like image 116
Vincent Zoonekynd Avatar answered Sep 22 '22 04:09

Vincent Zoonekynd


An alternative solution would be to use evals for evaluation from (my) pander package which has an internal (temporary in an environment for current R session or persistent with disk storage) caching engine. Short example based on your code:

library(pander)
ProtoTester <- proto(
  calc = function(.) {
    fn <- function() runif(1)
    mfn <- evals('fn()')[[1]]$result
    invisible(mfn)
  }      
)

And running evals with cache off and on would result in:

> evals.option('cache', FALSE)
> replicate(5, ProtoTester$calc())
[1] 0.7152186 0.4529955 0.4160411 0.1166872 0.8776698

> evals.option('cache', TRUE)
> evals.option('cache.time', 0)
> replicate(5, ProtoTester$calc())
[1] 0.7716874 0.7716874 0.7716874 0.7716874 0.7716874

Please note that the evals.option function si to be renamed to evalsOption soon to mitigate R CMD check warnings about S3 methods.

like image 28
daroczig Avatar answered Sep 25 '22 04:09

daroczig