Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid 'sink stack is full' error when sink() is used to capture messages in foreach loop

Tags:

foreach

r

cat

sink

In order to see the console messages output by a function running in a foreach() loop I followed the advice of this guy and added a sink() call like so:

   library(foreach)    
   library(doMC)
   cores <- detectCores()
   registerDoMC(cores)

   X <- foreach(i=1:100) %dopar%{
   sink("./out/log.branchpies.txt", append=TRUE)
   cat(paste("\n","Starting iteration",i,"\n"), append=TRUE)
   myFunction(data, argument1="foo", argument2="bar")
   }

However, at iteration 77 I got the error 'sink stack is full'. There are well-answered questions about avoiding this error when using for-loops, but not foreach. What's the best way to write the otherwise-hidden foreach output to a file?

like image 758
Roger Avatar asked Oct 10 '14 09:10

Roger


3 Answers

This runs without errors on my Mac:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  sink("log.branchpies.txt", append=TRUE)
  cat(paste("\n","Starting iteration",i,"\n"))
  sink() #end diversion of output
  rnorm(i*1e4)
}

This is better:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)
sink("log.branchpies.txt", append=TRUE)
X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"))
    rnorm(i*1e4)
}
sink() #end diversion of output

This works too:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"), 
       file="log.branchpies.txt", append=TRUE)
  rnorm(i*1e4)
}
like image 89
Roland Avatar answered Nov 15 '22 10:11

Roland


As suggested by this guy , it is quite tricky to keep track of the sink stack. It is, therefore advised to use ability of cat to write to file, such as suggested in the answer above:

cat(..., file="log.txt", append=TRUE)

To save some typing you could create a wrapper function that diverts output to file every time cat is called:

catf <- function(..., file="log.txt", append=TRUE){
  cat(..., file=file, append=append)
}

So that at the end, when you call foreach you would use something like this:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  catf(paste("\n","Starting iteration",i,"\n"))
  rnorm(i*1e4)
}

Hope it helps!

like image 45
dmi3kno Avatar answered Nov 15 '22 10:11

dmi3kno


Unfortunately, none of the abovementioned approaches worked for me: With sink() within the foreach()-loop, it did not stop to throw the "sink stack is full"-error. With sink() outside the loop, the file was created, but never updated.

To me, the easiest way of creating a log-file to keep track of a parallelised foreach()-loop's progress is by applying the good old write.table()-function.

    library(foreach)
    library(doParallel)

    availableClusters <- makeCluster(detectCores() - 1) #use all cpu-threads but one (i.e. one is reserved for the OS)
    registerDoParallel(availableClusters) #register the available cores for the parallisation

    x <- foreach (i = 1 to 100) %dopar% {
           log.text <- paste0(Sys.time(), " processing loop run ", i, "/100")
           write.table(log.text, "loop-log.txt", append = TRUE, row.names = FALSE, col.names = FALSE)

           #your statements here
    }

And don't forget (as I did several times...) to use append = TRUE within write.table().

like image 37
ChristianB Avatar answered Nov 15 '22 11:11

ChristianB