Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will RStan run on a supercomputer?

Tags:

r

stan

Stan is a new Bayesian analysis software by Gelman et al.

RStan is, I am guessing, a way to call Stan from within R.

Will Stan / RStan run on a supercomputer with a Linux operating system, and if so can it take advantage of the super-computer's multi-processors? I have been told that WinBUGS will not run on a Linux machine and/or cannot take advantage of a supercomputer's multi-processors.

I am looking for a way to speed up Bayesian analyses - from weeks to days / hours.

like image 837
Mark Miller Avatar asked Oct 11 '12 20:10

Mark Miller


3 Answers

Here's a concrete parallelization function that takes source code as text:

library(rstan)
library(parallel)

parallel_stan <- function(code, data, cores=detectCores(), chains=8, iter=2000, seed=1234) {
    cat("parallel_stan: cores=", cores, ", chains=", chains, ", iter=", iter, ", seed=", seed, "\n", sep="")
    cat("--- Step 1: compile the model (and run it once, very briefly, ignoring its output)\n")
    f1 = stan(model_code = code, data = data, iter = 1, seed = seed, chains = 1, chain_id = 1)
    cat("--- Step 2: run more chains in parallel\n")
    sflist <- mclapply(
        1:chains
        , mc.cores = cores
        , function(i) stan(fit = f1, data = data, iter = iter, seed = seed, chains = 1, chain_id = i)
    )
    # ... passing the same seed to all chains follows example(sflist2stanfit)
    # ... important to use the same seed but different chain_id when executing in parallel
    cat("--- Finished.\n")
    return(sflist2stanfit(sflist))
}
like image 199
Rudolf Cardinal Avatar answered Oct 20 '22 12:10

Rudolf Cardinal


Stan and rstan should run on Linux, Mac, or Windows that supports the dependencies. We have not tested on BSD or Oracle, but we expect them to work with either the g++ or clang compilers (although not the Oracle compilers).

There is no explicitly parallel code in Stan or rstan but neither is there any code that prevents the binary from being executed by several processes simultaneously. For example, if you use Stan from the command line in a bash shell, you could do something like

./my_model --data=my_data.dump --seed=12345 --chain_id=1 --samples=samples_1.csv &
./my_model --data=my_data.dump --seed=12345 --chain_id=2 --samples=samples_2.csv &

and so forth for as many chains as you like. It is important to use the same seed but different chain_id when executing in parallel.

If you are using the rstan package, you can call the main stan() function using any of the parallel engines supported by R and your operating system. Again, it is best to pass the same seed and different chain_id. As of rstan v1.0.3 (not released yet), there is a function called sflist2stanfit() that takes a list of stanfit objects that may have been generated in parallel and combines them into a single stanfit object for analysis.

For more information, there is a thread devoted to parallel execution at

https://groups.google.com/d/topic/stan-users/3goteHAsJGs/discussion

like image 20
Ben Goodrich Avatar answered Oct 20 '22 14:10

Ben Goodrich


I wrote that I would post what I learned.

The university Supercomputing Center believes that RStan will run on their machines. However, I must apply for an account, which might take some time. So, I will not be certain that RStan will run on those machines for a while yet. For what it is worth the formal name of their facility is the 'Arctic Region Supercomputing Center'.

I had trouble installing RStan on my desktop and had to get OIT assistance. So, here are the steps I used and the code used by the OIT gentleman. I have a Windows 7 Professional operating system.

  1. I had to use R 2.15.1

  2. I installed R in the directory 'C:\R\R-2.15.1' so there would be no spaces in the directory name

  3. I had to install Rtools.

  4. I installed Rtools in the directory 'C:\Rtools'

  5. Make sure that Rtools appears in the path so that R can locate the C++ compiler in Rtools

  6. To check:

    Computer, Properties, Advanced System Setting, Environment Variables, Path.

    I think I should include both: 'c:\Rtools\bin' and: 'c:\Rtools\gcc-4.6.3\bin'

  7. Open R

  8. Here is the R code to type (this code appears here: http://code.google.com/p/stan/wiki/RStanGettingStarted):

    install.packages('inline')

    install.packages('Rcpp')

    install.packages('RcppEigen')

    options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R"))

    install.packages('rstan', type = 'source')

    library(rstan)

  9. Then I ran the school example from here:

http://code.google.com/p/stan/wiki/RStanGettingStarted

Last week I had been trying to install STAN using instructions contained within the pdf file 'stan-reference-1.0.2' instead of the instructions at the above link.

I hope this helps others. If and when I learn whether RStan definitely will run on the Supercomputing Center machines I will post here what I learn.

I have not uninstalled STAN to test the above procedure. Hopefully I did not make any errors in the above steps.

like image 5
Mark Miller Avatar answered Oct 20 '22 13:10

Mark Miller