Could someone give some general instructions on how one can parallelize the PyMC MCMC
code. I am trying to run LASSO
regression following the example given here. I read somewhere that parallel sampling is done by default, but do I still need to use something like Parallel Python
to get it to work?
Here is some reference code that I would like to be able to parallelize on my machine.
x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)
X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3
beta1_lasso = pymc.Laplace('beta1', mu=0, tau=1.0 / b)
beta2_lasso = pymc.Laplace('beta2', mu=0, tau=1.0 / b)
beta3_lasso = pymc.Laplace('beta3', mu=0, tau=1.0 / b)
@pymc.deterministic
def y_hat_lasso(beta1=beta1_lasso, beta2=beta2_lasso, beta3=beta3_lasso, x1=x1, x2=x2, x3=x3):
return beta1 * x1 + beta2 * x2 + beta3 * x3
Y_lasso = pymc.Normal('Y', mu=y_hat_lasso, tau=1.0, value=y, observed=True)
lasso_model = pymc.Model([Y_lasso, beta1_lasso, beta2_lasso, beta3_lasso])
lasso_MCMC = pymc.MCMC(lasso_model)
lasso_MCMC.sample(20000,5000,2)
It looks like you are using PyMC2, and as far as I know, you must use some Python approach to parallel computation, like IPython.parallel. There are many ways to do this, but all the ones I know are a little bit complicated. Here is an example of one, which uses PyMC2, IPCluster, and Wakari.
In PyMC3, parallel sampling is implemented in the psample
method, but your reference code will need to be updated to the PyMC3 format:
with pm.Model() as model:
beta1 = pm.Laplace('beta1', mu=0, b=b)
beta2 = pm.Laplace('beta2', mu=0, b=b)
beta3 = pm.Laplace('beta3', mu=0, b=b)
y_hat = beta1 * x1 + beta2 * x2 + beta3 * x3
y_obs = pm.Normal('y_obs', mu=y_hat, tau=1.0, observed=y)
trace = pm.psample(draws=20000, step=pm.Slice(), threads=3)
PYMC3 has merged the psample into sample.
To run in parallel set the parameter njobs > 1
.
The usage for the pymc.sample function is:
sample(draws, step, start=None, trace=None, chain=0, njobs=1, tune=None,
progressbar=True, model=None, random_seed=None)
Note if you set njobs=None
, it will default to Number of CPUs - 2.
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With