I have the following question.
Why when submit the job on the standard node (maximum cores 56) everything runs fine, however when I submit the same job/code to the large_memory node (maximum cores 128), I get an error?
> no_cores <- detectCores() - 1
> cl <- makeCluster(no_cores, outfile=paste0('./info_parallel.log'))
Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b", : cannot open the connection Calls: <Anonymous> ... doTryCatch -> recvData -> makeSOCKmaster -> socketConnection In addition: Warning message: In socketConnection(master, port = port, blocking = TRUE, open = "a+b", : localhost:11232 cannot be opened Execution halted Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Execution halted Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Execution halted
As I said, the R code runs fine on the standard nodes, so I assume it is a problem with the large_memory node. What can that be?
Finally, I sovled it.
The error was caused by the default limit of connections in R. The default value of connections is 128. Here, "connections" means the number of cores per node, which are used in the code.
While, in the code, the errors happened at this line of "cl <- makeCluster........"
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores, outfile=paste0('./info_parallel.log'))
Here, detectCores() will get the maximum number of cores on the node.
In the standard nodes of the cluster, the number of cores per node is less than 128, That's why the R code can run well on the standard nodes; while, the number of cores per node in large_memory partition is 128 in my case. It reaches the limit number of cores by default. So the error shows as:
cannot open the connection
I tried to set the number of cores as 120 for running jobs on the large_memory node (maximum cores = 128). No errors. The code works well.
cl <- makeCluster( 120, outfile=paste0('./info_parallel.log'))
Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With