Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doParallel error in R: Error in serialize(data, node$con) : error writing to connection

The functions serialize and unserialize are called by the master process to communicate with the workers when using a socket cluster. If you get an error from either of those functions, it usually means that at least one of the workers has died. On a Linux machine, it might have died because the machine was almost out of memory, so the out-of-memory killer decided to kill it, but there are many other possibilities.

I suggest that you use the makeCluster outfile="" option when creating the cluster object so that output from the workers is displayed. If you're lucky, you'll get an error message from a worker before it dies that will help you to solve the problem.


I had the same problem, when I tried to use all 8 cores of my machine. When I left one open, then the problem went away. I believe the system requires 1 core for service tasks left open, or else you'll get an error:

library(doParallel)
#Find out how many cores are available (if you don't already know)
cores<-detectCores()
#Create cluster with desired number of cores, leave one open for the machine         
#core processes
cl <- makeCluster(cores[1]-1)
#Register cluster
registerDoParallel(cl)

I received a similar error from the following, where I terminated by model training early, and then tried to run it again. Here is an example, I am using the caret package to train a model, but I think it is applicable in any application where parallel processing is involved.

> cluster <- makeCluster(10)
> registerDoParallel(cluster)
> train(... , trControl = trainControl(allowParallel = T)
# Terminated before complete
> train(... , trControl = trainControl(allowParallel = T)
Error in serialize(data, node$con) : error writing to connection

I closed the cluster and reinitiated it:

stopCluster(cluster)
registerDoSEQ()
cluster <- makeCluster(10)
registerDoParallel(cluster)

Did not see the error when running the model again. Sometimes turning it off and back on again really may be the solution.


After received this error message, I changed my code to non-parallel for loop. Then I received error message "cannot allocate vector of size *** Gb". I guess the parallel fails may be caused by the same reason, just different error message.


Each core you assign consumes memory. So the more cores means more memory is being demanded and as soon you run out of it, you will receive this error. So my suggestion is to reduce the number of cores for Parallelization.

Having 8 cores myself and 32 GB memory available, I tried using 7 and then 6 cores myself and ran into a similar error. After which I decided to dedicate only 4 cores for which it is consuming around 70% of the memory:-

enter image description here

1 more core probably would have worked.

P.S: don't mind the picture quality.