I am trying to use PyMC to estimate the parameters of my model. However I am unable to understand how one estimates the parameters of a model which is not a standard distribution but perhaps a sum or a function of some other distributions.
Example: Lets say I have data "Data" generated by a process which is a sum of 2 Random Variables X and Y which are both drawn from Uniform Distributions with parameters (a, b) and (c,d) respectively. I would like to model this using PyMC and estimate back the parameters a, b,c, and d. I am able to setup the priors for the parameters but am not sure how to specify the observed variable and bind it to the observed data.
If the Distribution of the observed variable was standard (say O) I would just do:
obs = pm.DistO(params, observed= True, value=data)
but this is not the case. Can I model this scenario in PyMC at all ?
Python code I am using below:
import numpy as np
import pymc as pm
# Generate the synthetic data
a = 2.0
b = 8.0
c = 6.0
d = 10.0
d1 = np.random.uniform(a, b, 100)
d2 = np.random.uniform(c, d, 100)
data = d1 + d2
# Now lets try to recover the parameters.
#Setup the priors
# data is observed. Now lets recover the params
p_a = pm.Normal("pa", 0.0, 10.0)
p_b = pm.Normal("pb", 0.0, 10.0)
p_c = pm.Normal("pc", 0.0, 10.0)
p_d = pm.Normal("pd", 0.0, 10.0)
p_d1 = pm.Uniform("pd1", p_a, p_b)
p_d2 = pm.Uniform("pd2", p_c, p_d)
# Here is where I am confused ?
# p_data = p_d1 + p_d2
# How to now specify that p_data's value is observed (the observations are in "data")
#TODO: Use MCMC to sample and obtain traces
You can model this scenario in PyMC2, and in a sense it is easy to do so. But it is also hard to do, so I will demonstrate a solution for the special case of a model where $b-a = d-c$.
I say it is easy because PyMC2 can use an arbitrary function as a data log-likelihood, using the observed decorator. For example:
@pm.observed
def pdata(value=data, pa=pa, pb=pb, pc=pc, pd=pd):
logpr = 0.
# code to calculate logpr from data and parameters
return logpr
It is not easy because you have to come up with the code to calculate logpr, and I find that complicated and error-prone for cases like the sum of two uniforms. This code will also be in the inner loop of the MCMC, so efficiency is important.
If the data were from a single uniform with unknown support, you could use the decorator approach as follows:
@pm.observed
def pdata(value=data, pa=pa, pb=pb, pc=pc, pd=pd):
return pm.uniform_like(value, pa, pb)
In the special case of your model where the uniforms have the same width, the data likelihood is proportional to a triangular distribution, and you can write it out with only a moderate amount of pain:
@pm.observed
def pdata(value=data, pa=pa, pb=pb, pc=pc, pd=pd):
pd = pc + (pb - pa) # don't use pd value
# make sure order is acceptible
if pb < pa or pd < pc:
return -np.inf
x = value
pr = \
np.where(x < pa+pc, 0,
np.where(x <= (pa+pb+pc+pd)/2, x - (pa+pc),
np.where(x <= (pb+pd), (pb+pd) - x,
0))) \
/ (.5 * ((pb+pd) - (pa+pc)) * ((pb-pa) + (pd-pc))/2)
return np.sum(np.log(pr))
For the general sum-of-two-uniforms you posted, the distribution is trapezoidal, and writing it out is more work than I am up for. Approximate Bayesian Computation (ABC) is a promising approach for situations where it is too much work to compute the log-likelihood explicitly.
Here is a notebook collecting this all up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With