I am struggling to implement a linear regression in pymc3 with a custom likelihood.
I previously posted this question on CrossValidated & it was recommended to post here as the question is more code orientated (closed post here)
Suppose you have two independent variables x1, x2 and a target variable y, as well as an indicator variable called delta.
Example snippet of observed data:
x_1 x_2 𝛿 observed_target
10 1 0 100
20 2 0 50
5 -1 1 200
10 -2 1 100
Does anyone know how this can be implemented in pymc3? As a starting point...
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
mu = (intercept + beta_0*x_0 + beta_1*x_1)
PyMC3 expects the logp() method to return a log-probability evaluated at the passed value argument. This method is used internally by all of the inference methods to calculate the model log-probability, which is then used for fitting models.
PyMC3 is a probabilistic programming package for Python that allows users to fit Bayesian models using a variety of numerical methods, most notably Markov chain Monte Carlo (MCMC) and variational inference (VI). Its flexibility and extensibility make it applicable to a large suite of problems.
PyMC3 is the current version of the PyMC open source probabilistic programming framework for Python, having been forked over 1,300 times and starred over 5,400 times on GitHub.
To reload it, use: %reload_ext Cython Running on PyMC3 v3.6 PyMC3 is a great tool for doing Bayesian inference and parameter estimation. It has a load of in-built probability distributions that you can use to set up priors and likelihood functions for your particular model.
I might explore those issues in a later post. tl;dr: There are numerous ways to estimate custom maximum likelihood models in Python, and what I find is: For the most features, I recommend using the Genericlikelihoodmodel class from Statsmodels even if it is the least intuitive way for programmers familiar with Matlab.
One potential advantage of using PyMC3 is that the hessian could be calculated off of analytical gradiants and if this is the case would likely yield more accurate standard errors than any of the other methods presented in this post (including Matlab). 1
The likelihood function just simply loop through all the parameters. pm.Potential was used.
In case this is helpful, from reading the docs it looks like something along these lines might work, but I have not been able to test it and it was too long to pop into a comment.
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
delta = pm.Data('delta', df['delta']) # Or whatever this column is
target = pm.Data('target', df['observed_target'])
ypred = (intercept + beta_0*x_0 + beta_1*x_1) # Intermediate result
target_ge_ypred = pm.math.ge(target, ypred) # Compare target to intermediate result
zero = pm.math.constant(0) # Use this if delta==1 and target<ypred
# EDIT: Check delta
alternate = pm.math.switch(target_ge_ypred, ypred, zero) # Alternative result
mu = pm.math.switch(pm.math.eq(delta, zero), ypred, alternate) # Actual result wanted?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With