Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to speed up PyMC markov model?

Tags:

python

pymc

pymc3

Is there a way to speed up this simple PyMC model? On 20-40 data points, it takes ~5-11 seconds to fit.

import pymc
import time
import numpy as np
from collections import OrderedDict

# prior probability of rain
p_rain = 0.5
variables = OrderedDict()
# rain observations
data = [True, True, True, True, True,
        False, False, False, False, False]*4
num_steps = len(data)
p_rain_given_rain = 0.9
p_rain_given_norain = 0.2
p_umbrella_given_rain = 0.8
p_umbrella_given_norain = 0.3
for n in range(num_steps):
    if n == 0:
        # Rain node at time t = 0
        rain = pymc.Bernoulli("rain_%d" %(n), p_rain)
    else:
        rain_trans = \
          pymc.Lambda("rain_trans",
                      lambda prev_rain=variables["rain_%d" %(n-1)]: \
                      prev_rain*p_rain_given_rain + (1-prev_rain)*p_rain_given_norain)
        rain = pymc.Bernoulli("rain_%d" %(n), p=rain_trans)
    umbrella_obs = \
      pymc.Lambda("umbrella_obs",
                  lambda rain=rain: \
                  rain*p_umbrella_given_rain + (1-rain)*p_umbrella_given_norain)
    umbrella = pymc.Bernoulli("umbrella_%d" %(n), p=umbrella_obs,
                              observed=True,
                              value=data[n])
    variables["rain_%d" %(n)] = rain
    variables["umbrella_%d" %(n)] = umbrella

print "running on %d points" %(len(data))
all_vars = variables.values()
t_start = time.time()
model = pymc.Model(all_vars)
m = pymc.MCMC(model)
m.sample(iter=2000)
t_end = time.time()
print "\n%.2f secs to run" %(t_end - t_start)

With only 40 data points, it takes 11 seconds to run:

running on 40 points
 [-----------------100%-----------------] 2000 of 2000 complete in 11.5 sec
11.54 secs to run

(with 80 points it takes 20 seconds). This is a toy example. The expressions inside Lambda() that determine transitions are in practice more complex. This basic code structure is flexible (whereas encoding the model with transition matrices is less flexible). Is there a way to keep a similar code structure but get better performance? Happy to switch to PyMC3 if necessary. Thanks.

like image 938
yyk Avatar asked Dec 08 '15 16:12

yyk


1 Answers

Markov-Chain Monte Carlo is a known sequential problem.

This means its runtime proportional to number of steps and run time of your fitness function.

There are some tricks you can do, however:

  • Use PyPy (requires rewrite, pymc not supported)
  • Use Gibbs sampling to improve your next step
  • Use multiple start points (in parallel)
  • Use multiple branches (in parallel)
  • Use heuristic to stop the chain earlier
  • Use approximation for points that are close to already computed

Harder approaches:

  • Use Numba (compiles fitness function to machine code)
  • rewrite your fitness function in C (or similar)
  • use native MCMC code (non-Python, requires the above)

Finally there's a lot of research out there:

http://www.mas.ncl.ac.uk/~ndjw1/docs/pbc.pdf

https://sites.google.com/site/parallelmcmc/

http://pyinsci.blogspot.com/2010/12/efficcient-mcmc-in-python.html (pypy)

like image 115
Dima Tisnek Avatar answered Sep 23 '22 02:09

Dima Tisnek