how to speed up PyMC markov model?

Question

Is there a way to speed up this simple PyMC model? On 20-40 data points, it takes ~5-11 seconds to fit.

import pymc
import time
import numpy as np
from collections import OrderedDict

# prior probability of rain
p_rain = 0.5
variables = OrderedDict()
# rain observations
data = [True, True, True, True, True,
        False, False, False, False, False]*4
num_steps = len(data)
p_rain_given_rain = 0.9
p_rain_given_norain = 0.2
p_umbrella_given_rain = 0.8
p_umbrella_given_norain = 0.3
for n in range(num_steps):
    if n == 0:
        # Rain node at time t = 0
        rain = pymc.Bernoulli("rain_%d" %(n), p_rain)
    else:
        rain_trans = \
          pymc.Lambda("rain_trans",
                      lambda prev_rain=variables["rain_%d" %(n-1)]: \
                      prev_rain*p_rain_given_rain + (1-prev_rain)*p_rain_given_norain)
        rain = pymc.Bernoulli("rain_%d" %(n), p=rain_trans)
    umbrella_obs = \
      pymc.Lambda("umbrella_obs",
                  lambda rain=rain: \
                  rain*p_umbrella_given_rain + (1-rain)*p_umbrella_given_norain)
    umbrella = pymc.Bernoulli("umbrella_%d" %(n), p=umbrella_obs,
                              observed=True,
                              value=data[n])
    variables["rain_%d" %(n)] = rain
    variables["umbrella_%d" %(n)] = umbrella

print "running on %d points" %(len(data))
all_vars = variables.values()
t_start = time.time()
model = pymc.Model(all_vars)
m = pymc.MCMC(model)
m.sample(iter=2000)
t_end = time.time()
print "
%.2f secs to run" %(t_end - t_start)

With only 40 data points, it takes 11 seconds to run:

running on 40 points
 [-----------------100%-----------------] 2000 of 2000 complete in 11.5 sec
11.54 secs to run

(with 80 points it takes 20 seconds). This is a toy example. The expressions inside Lambda() that determine transitions are in practice more complex. This basic code structure is flexible (whereas encoding the model with transition matrices is less flexible). Is there a way to keep a similar code structure but get better performance? Happy to switch to PyMC3 if necessary. Thanks.

Dima Tisnek · Accepted Answer

Markov-Chain Monte Carlo is a known sequential problem.

This means its runtime proportional to number of steps and run time of your fitness function.

There are some tricks you can do, however:

Use PyPy (requires rewrite, pymc not supported)
Use Gibbs sampling to improve your next step
Use multiple start points (in parallel)
Use multiple branches (in parallel)
Use heuristic to stop the chain earlier
Use approximation for points that are close to already computed

Harder approaches:

Use Numba (compiles fitness function to machine code)
rewrite your fitness function in C (or similar)
use native MCMC code (non-Python, requires the above)

Finally there's a lot of research out there:

http://www.mas.ncl.ac.uk/~ndjw1/docs/pbc.pdf

https://sites.google.com/site/parallelmcmc/

http://pyinsci.blogspot.com/2010/12/efficcient-mcmc-in-python.html (pypy)

how to speed up PyMC markov model?

Tags:

python

pymc

pymc3

yyk

1 Answers

Dima Tisnek

Recent Activity

Donate For Us

how to speed up PyMC markov model?

Tags:

python

pymc

pymc3

yyk

1 Answers

Dima Tisnek

Related questions

Recent Activity

Donate For Us