Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to run multiple chains with JAGS on multiple cores (subdividing chains)

I’m wondering if it’s possible to subdivide 3 chains in JAGS on 5 or 6 cores, for example. Here is my code:

  library(parallel)
  # There is no progression bar using parallel
           jags.parallel(data = d$data,
                         inits = d$inits,
                         parameters.to.save = d$params,
                         model.file = model.jags,
                         n.chains = 3,
                         n.thin = 10,
                         n.iter = 9000,
                         n.burnin = 3000,
                         working.directory = NULL,
                         n.cluster = 3) ## the number of cluster it’s taking

As you can see, and this is the default, the number of chains (nc here which is 3 in my case) equals the number of core used.

  1. How is this influencing the way the MCMC is sampled?
  2. Is there an optimal number of core to use with R when running MCMC chains in parallel?
  3. I saw that I cannot go under 3 cores if I have 3 chains. It gives me this error: Error in res[[ch]] : subscript out of bounds. Why?
  4. If I increase the number of cores, it takes longer (as a comparison, with 12 cores it takes 7.2 more time than 3 cores)! Shouldn’t it be the reverse?
  5. How can I make the script faster without removing iterations, burn-in or adding thinning (more cores?, change the RAM?)?

My computer has 16 cores, so I have flexibility on the number of cores (also have 64 GB of RAM and 3 GHz Intel Xeon E5 processor).

like image 259
M. Beausoleil Avatar asked May 24 '16 15:05

M. Beausoleil


1 Answers

It would not be possible to split 3 chains onto multiple cores. When running JAGS in parallel here is effectively what happens:

  1. Do the specified burn in for each chain. In your example, the three chains would run the model for 3000 steps and not store that information.

  2. Once each chain has had the appropriate burn in time, the number of samples you want from the posterior distribution is split equally over each chain. In your example, each chain would run the sampler for 600 steps ((n.iter -n.thin)/n.chains).

So, let's move on to your questions (# 1 is explained above).

  1. The answer to this will depend on what else you are doing on that computer at the time. You never want to run it on all K cores of your computer as it will take up most of your computing power. I generally run K-1 chains on K-1 cores for larger models. For simple models, it does not really matter.

  2. You could have multiple chains run on fewer cores, but then you are slowing things down because each chain on a core would have to be computed sequentially. Conversely, it would not work to farm out fewer chains onto multiple cores. If you have x chains, you should not have > x cores.

  3. This is answered through questions 2 and 3. More chains should increase computing, but more cores without more chains will not.

  4. This really cannot be answered without looking at your model.

like image 146
mfidino Avatar answered Sep 28 '22 00:09

mfidino