Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MC-Stan on Spark?

I hope to use MC-Stan on Spark, but it seems there is no related page searched by Google.

I wonder if this approach is even possible on Spark, therefore I would appreciate if someone let me know.

Moreover, I also wonder what is the widely-used approach to use MCMC on Spark. I heard Scala is widely used, but I need some language that has a decent MCMC library such as MC-Stan.

like image 463
Kim Avatar asked Dec 18 '22 18:12

Kim


1 Answers

Yes it's certainly possible but requires a bit more work. Stan (and popular MCMC tools that I know of) are not designed to be run in a distributed setting, via Spark or otherwise. In general, distributed MCMC is an area of active research. For a recent review, I'd recommend section 4 of Patterns of Scalable Bayesian Inference (PoFSBI). There are multiple possible ways you might want to split up a big MCMC computation but I think one of the more straightforward ways would be splitting up the data and running an off-the-shelf tool like Stan, with the same model, on each partition. Each model will produce a subposterior which can be reduce'd together to form a posterior. PoFSBI discusses several ways of combining such subposteriors.

I've put together a very rough proof of concept using pyspark and pystan (python is the common language with the most Stan and Spark support). It's a rough and limited implementation of the weighted-average consensus algorithm in PoFSBI, running on the tiny 8-schools dataset. I don't think this example would be practically very useful but it should provide some idea of what might be necessary to run Stan as a Spark program: partition data, run stan on each partition, combine the subposteriors.

like image 166
homer Avatar answered Jan 11 '23 21:01

homer