Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what exactly does the first argument in makeCluster function do?

Tags:

r

rparallel

I am new to r programming as you can tell from the nature of my question. I am trying to take advantage of the parallel computing ability of the train function.

library(parallel)
#detects number of cores available to use for parallel package
nCores <- detectCores(logical = FALSE)
cat(nCores, " cores detected.")  

# detect threads with parallel()
nThreads<- detectCores(logical = TRUE)
cat(nThreads, " threads detected.")

# Create doSNOW compute cluster (try 64)
# One can increase up to 128 nodes
# Each node requires 44 Mbyte RAM under WINDOWS.
cluster <- makeCluster(128, type = "SOCK")
class(cluster);

I need someone to help me interpret this code. originally the first argument of makeCluster() had nthreads but after running

nCores <- detectCores(logical = FALSE)

I learned that I have 4 threads available. I changed the value based on the message provided in the guide. Will this enable me simultaneously run 128 iterations of the train function at once? If so what is the point of getting the number of threads and cores that my computer has in the first place?

like image 777
igbobahaushe Avatar asked Oct 28 '25 09:10

igbobahaushe


1 Answers

What you want to do is to detect first the amount of cores you have.

nCores <- detectCores() - 1

Most of the time people add minus 1 to be sure you have one core left to do other stuff on.

cluster <- makeCluster(nCores)

This will set the amount of clusters you want your code to run on. There are several parallel methods (doParallel, parApply, parLapply, foreach,..). Based on the parallel method you choose, there will run a method on one specific cluster you've created.

Small example I used in code of mine

  no_cores <- detectCores() - 1
  cluster <- makeCluster(no_cores)
  result <- parLapply(cluster, docs$text, preProcessChunk)
  stopCluster(cluster)

I also see that your making use of sock. Not sure if "type=SOCK" works. I always use "type=PSOCK". FORK also exists but it depends on which OS you're using.

FORK: "to divide in branches and go separate ways"
Systems: Unix/Mac (not Windows)
Environment: Link all

PSOCK: Parallel Socket Cluster
Systems: All (including Windows)
Environment: Empty
like image 168
D.Dsn Avatar answered Oct 29 '25 23:10

D.Dsn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!