Truncated multivariate normal in SciPy?

Tags:

I'm trying to automate a process that at some point needs to draw samples from a truncated multivariate normal. That is, it's a normal multivariate normal distribution (i.e. Gaussian) but the variables are constrained to a cuboid. My given inputs are the mean and covariance of the full multivariate normal but I need samples in my box.

Up to now, I'd just been rejecting samples outside the box and resampling as necessary, but I'm starting to find that my process sometimes gives me (a) large covariances and (b) means that are close to the edges. These two events conspire against the speed of my system.

So what I'd like to do is sample the distribution correctly in the first place. Googling led only to this discussion or the truncnorm distribution in scipy.stats. The former is inconclusive and the latter seems to be for one variable. Is there any native multivariate truncated normal? And is it going to be any better than rejecting samples, or should I do something smarter?

I'm going to start working on my own solution, which would be to rotate the untruncated Gaussian to it's principal axes (with an SVD decomposition or something), use a product of truncated Gaussians to sample the distribution, then rotate that sample back, and reject/resample as necessary. If the truncated sampling is more efficient, I think this should sample the desired distribution faster.

213

asked Nov 21 '13 08:11

Warrick

1 Answers

So, according to the Wikipedia article, sampling a multivariate truncated normal distribution (MTND) is more difficult. I ended up taking a relatively easy way out and using an MCMC sampler to relax an initial guess towards the MTND as follows.

I used emcee to do the MCMC work. I find this package phenomenally easy-to-use. It only requires a function that returns the log-probability of the desired distribution. So I defined this function

from numpy.linalg import inv

def lnprob_trunc_norm(x, mean, bounds, C):
    if np.any(x < bounds[:,0]) or np.any(x > bounds[:,1]):
        return -np.inf
    else:
        return -0.5*(x-mean).dot(inv(C)).dot(x-mean)

Here, C is the covariance matrix of the multivariate normal. Then, you can run something like

S = emcee.EnsembleSampler(Nwalkers, Ndim, lnprob_trunc_norm, args = (mean, bounds, C))

pos, prob, state = S.run_mcmc(pos, Nsteps)

for given mean, bounds and C. You need an initial guess for the walkers' positions pos, which could be a ball around the mean,

pos = emcee.utils.sample_ball(mean, np.sqrt(np.diag(C)), size=Nwalkers)

or sampled from an untruncated multivariate normal,

pos = numpy.random.multivariate_normal(mean, C, size=Nwalkers)

and so on. I personally do several thousand steps of sample discarding first, because it's fast, then force the remaining outliers back within the bounds, then run the MCMC sampling.

The number of steps for convergence is up to you.

Note also that emcee easily supports basic parallelization by adding the argument threads=Nthreads to the EnsembleSampler initialization. So you can make this blazing fast.

answered Sep 27 '22 18:09

Warrick

Related questions
                            
                                UUID python import fails on Cygwin 64bits
                            
                                Celery - How does countdown work?
                            
                                Django - Using a different email backend for admin error emails
                            
                                Load high-dimensional R dataset into Pandas DataFrame
                            
                                Emit signal in standard python thread
                            
                                How to add new requests for my Scrapy Spider during crawling
                            
                                matplotlib - radius in polygon edges - is it possible?
                            
                                Numpy efficient big matrix multiplication
                            
                                Outer join in Python for thousands of large tables
                            
                                Chi Square Test of Independence in Python
                            
                                Symbol not found, Expected in: flat namespace
                            
                                Check if an object is order-able in python?
                            
                                Django: HttpResponseRedirect not working
                            
                                How can I access a custom section in a Pyramid .ini file?
                            
                                Save raw data as tif
                            
                                SWIG python initialise a pointer to NULL
                            
                                Monitoring Java application with Python [closed]
                            
                                How to find a substring using partial matching [closed]
                            
                                Django: check if value in values_list with & without prefetch_related/select_related
                            
                                local histogram equalization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Truncated multivariate normal in SciPy?

Tags:

python

scipy

Warrick

People also ask

1 Answers

Warrick

Recent Activity

Donate For Us