Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Posterior probability with pymc

(This question was originally posted on stats.O. I moved it here because it does relate with pymc and more general matters within it: in fact the main aim is to have a better understanding of how pymc works. If any of the moderators believe it not to be suitable for SO, I'll remove from here.)

I've been reading the pymc tutorial and many other questions both in here and in SO.

I am trying to understand how to apply Bayes' theorem to compute the posterior probability using certain data. In particular, I have a tuple of independent parameters

From the data I'd like to infer the likelihood , where is a certain event. Then the aim is to compute

Some additional comments:

  1. This is a sort of unsupervised learning, I know that happened, and I want to find out the parameters that maximise the probability . (*)
  2. I'd also like to have a parallel procedure where I let pymc compute the likelihood given the data and then for each tuple of parameters I want to get the posterior probability.

In the following I'll assume that and that the likelihood is a multi-dimensional normal distribution with (because of the independence).

The following is the code I'm using (for simplicity assume there are only two parameters). The code is still under development (I know it can't work!). But I believe it is useful to include it, and then refine it following comments and answers to provide a skeleton for future reference.

class ObsData(object):
    def __init__(self, params):
        self.theta1 = params[0]
        self.theta2 = params[1]

class Model(object):
    def __init__(self, data):
        # Priors
        self.theta1 = pm.Uniform('theta1', 0, 100)
        self.theta2 = pm.Normal('theta2', 0, 0.0001)

        @pm.deterministic
        def model(
            theta1=self.theta1,
            theta2=self.theta2,
        ):
            return (theta1, theta2)
        # Is this the actual likelihood?
        self.likelihood = pm.MvNormal(
            'likelihood',
            mu=model,
            tau=np.identity(2),
            value=data,  # is it correct to put the data here?
            observed=True
        )

def mcmc(observed_data):
    data = ObsData(observed_data)
    pymc_obj = Model(data)
    model = pymc.MCMC(pymc_obj)
    model.sample(10000, verbose=0) # Does this line compute the likelihood and the normalisation factor? 
    # How do I get the posterior distribution?

The questions are the following:

  1. Does self.likelihood represent the Bayesian likelihood?
  2. How do I make use of the data? (I suspect value=data is incorrect..)
  3. Does .sample() actually compute the posterior probability?
  4. How do I get information from the posterior?
  5. (*)Should I include anything that relates to the fact that happened at some point?
  6. As general question: is it any way to use pymc to only compute the likelihood given the data and the priors?

Any comments, and also reference to other question or tutorial are welcome!

like image 421
rafforaffo Avatar asked Nov 09 '22 18:11

rafforaffo


1 Answers

For starters, I think you want to return (theta1*theta2) from your definition of model.

model.sample is sampling, not computing, the posterior distributions (given sufficient burn-in, etc) of your parameters given your data, and the likelihood of specific values for each tuple of parameters can be determined from the joint posterior after sampling.

I think you've got some fundamental misunderstanding of MCMC at the moment. I can't think of a better way to answer your questions than to point you to the wonderful Bayesian Methods for Hackers

like image 69
sjc Avatar answered Nov 14 '22 22:11

sjc