Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncated multivariate normal in SciPy?

Tags:

python

scipy

I'm trying to automate a process that at some point needs to draw samples from a truncated multivariate normal. That is, it's a normal multivariate normal distribution (i.e. Gaussian) but the variables are constrained to a cuboid. My given inputs are the mean and covariance of the full multivariate normal but I need samples in my box.

Up to now, I'd just been rejecting samples outside the box and resampling as necessary, but I'm starting to find that my process sometimes gives me (a) large covariances and (b) means that are close to the edges. These two events conspire against the speed of my system.

So what I'd like to do is sample the distribution correctly in the first place. Googling led only to this discussion or the truncnorm distribution in scipy.stats. The former is inconclusive and the latter seems to be for one variable. Is there any native multivariate truncated normal? And is it going to be any better than rejecting samples, or should I do something smarter?

I'm going to start working on my own solution, which would be to rotate the untruncated Gaussian to it's principal axes (with an SVD decomposition or something), use a product of truncated Gaussians to sample the distribution, then rotate that sample back, and reject/resample as necessary. If the truncated sampling is more efficient, I think this should sample the desired distribution faster.

like image 213
Warrick Avatar asked Nov 21 '13 08:11

Warrick


People also ask

How do you find the Gaussian distribution in Python?

Use the random. normal() method to get a Normal Data Distribution.

Which method represent cumulative distribution function of a defined distribution available in Scipy stats module?

Common methodscdf: Cumulative Distribution Function. sf: Survival Function (1-CDF) ppf: Percent Point Function (Inverse of CDF)

How do I get Scipy stats?

All of the statistics functions are located in the sub-package scipy. stats and a fairly complete listing of these functions can be obtained using info(stats) function. A list of random variables available can also be obtained from the docstring for the stats sub-package.

What is Scipy stats used for?

The scipy. stats module specializes in random variables and probability distributions. It implements more than 80 continuous distributions and 10 discrete distributions. In what follows we learn how to use the basic functionality.


1 Answers

So, according to the Wikipedia article, sampling a multivariate truncated normal distribution (MTND) is more difficult. I ended up taking a relatively easy way out and using an MCMC sampler to relax an initial guess towards the MTND as follows.

I used emcee to do the MCMC work. I find this package phenomenally easy-to-use. It only requires a function that returns the log-probability of the desired distribution. So I defined this function

from numpy.linalg import inv

def lnprob_trunc_norm(x, mean, bounds, C):
    if np.any(x < bounds[:,0]) or np.any(x > bounds[:,1]):
        return -np.inf
    else:
        return -0.5*(x-mean).dot(inv(C)).dot(x-mean)

Here, C is the covariance matrix of the multivariate normal. Then, you can run something like

S = emcee.EnsembleSampler(Nwalkers, Ndim, lnprob_trunc_norm, args = (mean, bounds, C))

pos, prob, state = S.run_mcmc(pos, Nsteps)

for given mean, bounds and C. You need an initial guess for the walkers' positions pos, which could be a ball around the mean,

pos = emcee.utils.sample_ball(mean, np.sqrt(np.diag(C)), size=Nwalkers)

or sampled from an untruncated multivariate normal,

pos = numpy.random.multivariate_normal(mean, C, size=Nwalkers)

and so on. I personally do several thousand steps of sample discarding first, because it's fast, then force the remaining outliers back within the bounds, then run the MCMC sampling.

The number of steps for convergence is up to you.

Note also that emcee easily supports basic parallelization by adding the argument threads=Nthreads to the EnsembleSampler initialization. So you can make this blazing fast.

like image 99
Warrick Avatar answered Sep 27 '22 18:09

Warrick