Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelization of PyMC

Could someone give some general instructions on how one can parallelize the PyMC MCMC code. I am trying to run LASSO regression following the example given here. I read somewhere that parallel sampling is done by default, but do I still need to use something like Parallel Python to get it to work?

Here is some reference code that I would like to be able to parallelize on my machine.

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3

beta1_lasso = pymc.Laplace('beta1', mu=0, tau=1.0 / b)
beta2_lasso = pymc.Laplace('beta2', mu=0, tau=1.0 / b)
beta3_lasso = pymc.Laplace('beta3', mu=0, tau=1.0 / b)

@pymc.deterministic
def y_hat_lasso(beta1=beta1_lasso, beta2=beta2_lasso, beta3=beta3_lasso, x1=x1, x2=x2, x3=x3):
    return beta1 * x1 + beta2 * x2 + beta3 * x3

Y_lasso = pymc.Normal('Y', mu=y_hat_lasso, tau=1.0, value=y, observed=True)

lasso_model = pymc.Model([Y_lasso, beta1_lasso, beta2_lasso, beta3_lasso])
lasso_MCMC = pymc.MCMC(lasso_model)
lasso_MCMC.sample(20000,5000,2)
like image 303
cmlakhan Avatar asked May 07 '14 15:05

cmlakhan


2 Answers

It looks like you are using PyMC2, and as far as I know, you must use some Python approach to parallel computation, like IPython.parallel. There are many ways to do this, but all the ones I know are a little bit complicated. Here is an example of one, which uses PyMC2, IPCluster, and Wakari.

In PyMC3, parallel sampling is implemented in the psample method, but your reference code will need to be updated to the PyMC3 format:

with pm.Model() as model:
    beta1 = pm.Laplace('beta1', mu=0, b=b)
    beta2 = pm.Laplace('beta2', mu=0, b=b)
    beta3 = pm.Laplace('beta3', mu=0, b=b)

    y_hat = beta1 * x1 + beta2 * x2 + beta3 * x3
    y_obs = pm.Normal('y_obs', mu=y_hat, tau=1.0, observed=y)

    trace = pm.psample(draws=20000, step=pm.Slice(), threads=3)
like image 149
Abraham D Flaxman Avatar answered Nov 10 '22 15:11

Abraham D Flaxman


PYMC3 has merged the psample into sample.

To run in parallel set the parameter njobs > 1.

The usage for the pymc.sample function is:

sample(draws, step, start=None, trace=None, chain=0, njobs=1, tune=None, progressbar=True, model=None, random_seed=None) Note if you set njobs=None, it will default to Number of CPUs - 2.

I hope this helps.

like image 10
closedloop Avatar answered Nov 10 '22 14:11

closedloop