Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parallel computations on Reference Classes

I have a list of fairly large objects that I want to apply a complicated function to in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply to modify them doesn't seem to work.

The function modifies the object itself, so I overwrite the original object with the new one. Since the object is a list and I'm only modifying a small part of it, I was hoping that R's copy-on-modify semantics would avoid having multiple copies made; however, in running it, it doesn't seem to be the case for what I'm doing. Here's a small example of the base R methods I have been using. It correctly resets the balance to zero.

## make a list of accounts, each with a balance
## and a function to reset the balance
foo <- lapply(1:5, function(x) list(balance=x))
reset1 <- function(x) {x$balance <- 0; x}
foo[[4]]$balance
## 4 ## BEFORE reset
foo <- mclapply(foo, reset1)
foo[[4]]$balance
## 0 ## AFTER reset

It seems that using Reference Classes might help as they are mutable, and when using lapply it does do as I expect; the balance is reset to zero.

Account <- setRefClass("Account", fields=list(balance="numeric"),
                       methods=list(reset=function() {balance <<- 0}))

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(lapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 0

But when I use mclapply, it doesn't properly reset. Note that if you're on Windows or have mc.cores=1, lapply will be called instead.

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(mclapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 4

What's going on? How can I work with Reference Classes in parallel? Is there a better way altogether to avoid unnecessary copying of objects?

like image 928
Aaron left Stack Overflow Avatar asked Dec 06 '13 18:12

Aaron left Stack Overflow


1 Answers

I think the forked processes, while they have access to all the variables in the workspace, must not be able to change them. This works, but I don't know yet if it improves the memory issues or not.

foo <- mclapply(foo, function(x) {x$reset(); x})
foo[[4]]$balance
## 0
like image 105
Aaron left Stack Overflow Avatar answered Oct 15 '22 06:10

Aaron left Stack Overflow