Given a posterior p(Θ|D) over some parameters Θ, one can define the following: <h3>Highest Posterior Density Region:</h3> The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass. In other words, for a given α, we look for a p* that satisfies: <img src="https://i.stack.imgur.com/vyBWk.png" alt="enter image description here"> and then obtain the Highest Posterior Density Region as the set: <img src="https://i.stack.imgur.com/TAu9M.png" alt="enter image description here"> <h3>Central Credible Region:</h3> Using the same notation as above, a Credible Region (or interval) is defined as: <img src="https://i.stack.imgur.com/FgvjN.png" alt="enter image description here"> Depending on the distribution, there could be many such intervals. The central credible interval is defined as a credible interval where there is (1-α)/2 mass on each tail. <h3>Computation:</h3> <ul> <li> For general distributions, given samples from the distribution, are there any built-ins in to obtain the two quantities above in Python or PyMC? </li> <li> For common parametric distributions (e.g. Beta, Gaussian, etc.) are there any built-ins or libraries to compute this using SciPy or statsmodels? </li> </ul>

From my understanding "central credible region" is not any different from how confidence intervals are calculated; all you need is the inverse of <code>cdf</code> function at <code>alpha/2</code> and <code>1-alpha/2</code>; in <code>scipy</code> this is called <code>ppf</code> ( percentage point function ); so as for Gaussian posterior distribution: <pre class="prettyprint"><code>>>> from scipy.stats import norm >>> alpha = .05 >>> l, u = norm.ppf(alpha / 2), norm.ppf(1 - alpha / 2) </code></pre> to verify that <code>[l, u]</code> covers <code>(1-alpha)</code> of posterior density: <pre class="prettyprint"><code>>>> norm.cdf(u) - norm.cdf(l) 0.94999999999999996 </code></pre> similarly for Beta posterior with say <code>a=1</code> and <code>b=3</code>: <pre class="prettyprint"><code>>>> from scipy.stats import beta >>> l, u = beta.ppf(alpha / 2, a=1, b=3), beta.ppf(1 - alpha / 2, a=1, b=3) </code></pre> and again: <pre class="prettyprint"><code>>>> beta.cdf(u, a=1, b=3) - beta.cdf(l, a=1, b=3) 0.94999999999999996 </code></pre> here you can see parametric distributions that are included in scipy; and I guess all of them have <code>ppf</code> function; As for highest posterior density region, it is more tricky, since <code>pdf</code> function is not necessarily invertible; and in general such a region may not even be connected; for example, in the case of Beta with <code>a = b = .5</code> ( as can be seen here); But, in the case of Gaussian distribution, it is easy to see that "Highest Posterior Density Region" coincides with "Central Credible Region"; and I think that is is the case for all symmetric uni-modal distributions ( i.e. if pdf function is symmetric around the mode of distribution) A possible numerical approach for the general case would be binary search over the value of <code>p*</code> using numerical integration of <code>pdf</code>; utilizing the fact that the integral is a monotone function of <code>p*</code>; <hr> Here is an example for mixture Gaussian: [ 1 ] First thing you need is an analytical pdf function; for mixture Gaussian that is easy: <pre class="prettyprint"><code>def mix_norm_pdf(x, loc, scale, weight): from scipy.stats import norm return np.dot(weight, norm.pdf(x, loc, scale)) </code></pre> so for example for location, scale and weight values as in <pre class="prettyprint"><code>loc = np.array([-1, 3]) # mean values scale = np.array([.5, .8]) # standard deviations weight = np.array([.4, .6]) # mixture probabilities </code></pre> you will get two nice Gaussian distributions holding hands: <img src="https://i.stack.imgur.com/29SMF.png" alt="enter image description here"> <hr> [ 2 ] now, you need an error function which given a test value for <code>p*</code> integrates pdf function above <code>p*</code> and returns squared error from the desired value <code>1 - alpha</code>: <pre class="prettyprint"><code>def errfn( p, alpha, *args): from scipy import integrate def fn( x ): pdf = mix_norm_pdf(x, *args) return pdf if pdf > p else 0 # ideally integration limits should not # be hard coded but inferred lb, ub = -3, 6 prob = integrate.quad(fn, lb, ub)[0] return (prob + alpha - 1.0)**2 </code></pre> <hr> [ 3 ] now, for a given value of <code>alpha</code> we can minimize the error function to obtain <code>p*</code>: <pre class="prettyprint"><code>alpha = .05 from scipy.optimize import fmin p = fmin(errfn, x0=0, args=(alpha, loc, scale, weight))[0] </code></pre> which results in <code>p* = 0.0450</code>, and HPD as below; the red area represents <code>1 - alpha</code> of the distribution, and the horizontal dashed line is <code>p*</code>. <img src="https://i.stack.imgur.com/dpe6N.png" alt="enter image description here">

To calculate HPD you can leverage pymc3, Here is an example <pre class="prettyprint"><code>import pymc3 from scipy.stats import norm a = norm.rvs(size=10000) pymc3.stats.hpd(a) </code></pre>

Highest Posterior Density Region and Central Credible Region

Highest Posterior Density Region:

The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass.

In other words, for a given α, we look for a p* that satisfies:

enter image description here

and then obtain the Highest Posterior Density Region as the set:

enter image description here

Central Credible Region:

Using the same notation as above, a Credible Region (or interval) is defined as:

enter image description here

Depending on the distribution, there could be many such intervals. The central credible interval is defined as a credible interval where there is (1-α)/2 mass on each tail.

Computation:

For general distributions, given samples from the distribution, are there any built-ins in to obtain the two quantities above in Python or PyMC?
For common parametric distributions (e.g. Beta, Gaussian, etc.) are there any built-ins or libraries to compute this using SciPy or statsmodels?

239

asked Mar 09 '14 16:03

Amelio Vazquez-Reina

2 Answers

From my understanding "central credible region" is not any different from how confidence intervals are calculated; all you need is the inverse of cdf function at alpha/2 and 1-alpha/2; in scipy this is called ppf ( percentage point function ); so as for Gaussian posterior distribution:

>>> from scipy.stats import norm >>> alpha = .05 >>> l, u = norm.ppf(alpha / 2), norm.ppf(1 - alpha / 2)

to verify that [l, u] covers (1-alpha) of posterior density:

>>> norm.cdf(u) - norm.cdf(l) 0.94999999999999996

similarly for Beta posterior with say a=1 and b=3:

>>> from scipy.stats import beta >>> l, u = beta.ppf(alpha / 2, a=1, b=3), beta.ppf(1 - alpha / 2, a=1, b=3)

and again:

>>> beta.cdf(u, a=1, b=3) - beta.cdf(l, a=1, b=3) 0.94999999999999996

here you can see parametric distributions that are included in scipy; and I guess all of them have ppf function;

As for highest posterior density region, it is more tricky, since pdf function is not necessarily invertible; and in general such a region may not even be connected; for example, in the case of Beta with a = b = .5 ( as can be seen here);

But, in the case of Gaussian distribution, it is easy to see that "Highest Posterior Density Region" coincides with "Central Credible Region"; and I think that is is the case for all symmetric uni-modal distributions ( i.e. if pdf function is symmetric around the mode of distribution)

A possible numerical approach for the general case would be binary search over the value of p* using numerical integration of pdf; utilizing the fact that the integral is a monotone function of p*;

Here is an example for mixture Gaussian:

[ 1 ] First thing you need is an analytical pdf function; for mixture Gaussian that is easy:

def mix_norm_pdf(x, loc, scale, weight):     from scipy.stats import norm     return np.dot(weight, norm.pdf(x, loc, scale))

so for example for location, scale and weight values as in

loc    = np.array([-1, 3])   # mean values scale  = np.array([.5, .8])  # standard deviations weight = np.array([.4, .6])  # mixture probabilities

you will get two nice Gaussian distributions holding hands:

enter image description here

[ 2 ] now, you need an error function which given a test value for p* integrates pdf function above p* and returns squared error from the desired value 1 - alpha:

def errfn( p, alpha, *args):     from scipy import integrate     def fn( x ):         pdf = mix_norm_pdf(x, *args)         return pdf if pdf > p else 0      # ideally integration limits should not     # be hard coded but inferred     lb, ub = -3, 6      prob = integrate.quad(fn, lb, ub)[0]     return (prob + alpha - 1.0)**2

[ 3 ] now, for a given value of alpha we can minimize the error function to obtain p*:

alpha = .05  from scipy.optimize import fmin p = fmin(errfn, x0=0, args=(alpha, loc, scale, weight))[0]

which results in p* = 0.0450, and HPD as below; the red area represents 1 - alpha of the distribution, and the horizontal dashed line is p*.

enter image description here

116

answered Sep 30 '22 07:09

behzad.nouri

To calculate HPD you can leverage pymc3, Here is an example

import pymc3 from scipy.stats import norm a = norm.rvs(size=10000) pymc3.stats.hpd(a)

answered Sep 30 '22 07:09

sushmit

Related questions
                            
                                X and Y axis labels for Bokeh figure
                            
                                Is there a built-in javascript function similar to os.path.join?
                            
                                How to sort list of lists according to length of sublists [duplicate]
                            
                                How do I select and store columns greater than a number in pandas?
                            
                                Django error. Cannot assign must be an instance
                            
                                Recursively access dict via attributes as well as index access?
                            
                                boost::python: Python list to std::vector
                            
                                Python: How do I make temporary files in my test suite?
                            
                                Best way for a beginner to learn screen scraping by Python [closed]
                            
                                List directories with a specified depth in Python
                            
                                Python : 2d contour plot from 3 lists : x, y and rho?
                            
                                correct and efficient way to flatten array in numpy in python? [duplicate]
                            
                                Selenium - Click at certain position
                            
                                fminunc alternate in numpy
                            
                                How to provide Python syntax coloring inside Webstorm?
                            
                                replace the punctuation with whitespace
                            
                                Downloading multiple attachments using imaplib
                            
                                Parsing crontab-style lines
                            
                                Convert UUID 32-character hex string into a "YouTube-style" short id and back
                            
                                Read CSV file to numpy array, first row as strings, rest as float

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Highest Posterior Density Region and Central Credible Region

Tags:

python

statistics

scipy

statsmodels

pymc

Highest Posterior Density Region:

Central Credible Region:

Computation:

Amelio Vazquez-Reina

People also ask

2 Answers

behzad.nouri

sushmit

Recent Activity

Donate For Us