Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorized implementation for `numpy.random.multivariate_normal`

Tags:

python

numpy

I am trying to use numpy.random.multivariate_normal to generate multiple samples where each sample is drawn from a multivariate Normal distribution with a different mean and cov. For example, if I would like to draw 2 samples, I tried

from numpy import random as rand

means = np.array([[-1., 0.], [1., 0.]])
covs = np.array([np.identity(2) for k in xrange(2)]) 
rand.multivariate_normal(means, covs)

but this results in ValueError: mean must be 1 dimensional. Do I have to do a for loop for this? I thought that for functions like rand.binomial this is possible.

like image 751
p-value Avatar asked Apr 05 '18 20:04

p-value


1 Answers

As @hpaulj suggested, you can generate samples from the standard multivariate normal distribution, and then use, say, einsum and/or broadcasting to transform the samples. The scaling is done by multiplying the standard sample points by the square root of the covariance matrix. In the following, I use scipy.linalg.sqrtm to compute the matrix square root, and numpy.einsum to do the matrix multiplication.

import numpy as np
from scipy.linalg import sqrtm
import matplotlib.pyplot as plt


# Sequence of means
means = np.array([[-15., 0.], [15., 0.], [0., 0.]])
# Sequence of covariance matrices.  Must be the same length as means.
covs = np.array([[[ 3, -1],
                  [-1,  2]],
                 [[ 1,  2],
                  [ 2,  5]],
                 [[ 1,  0],
                  [ 0,  1]]])
# Number of samples to generate for each (mean, cov) pair.
nsamples = 4000

# Compute the matrix square root of each covariance matrix.
sqrtcovs = np.array([sqrtm(c) for c in covs])

# Generate samples from the standard multivariate normal distribution.
dim = len(means[0])
u = np.random.multivariate_normal(np.zeros(dim), np.eye(dim),
                                  size=(len(means), nsamples,))
# u has shape (len(means), nsamples, dim)

# Transform u.
v = np.einsum('ijk,ikl->ijl', u, sqrtcovs)
m = np.expand_dims(means, 1)
t = v + m

# t also has shape (len(means), nsamples, dim).
# t[i] holds the nsamples sampled from the distribution with mean means[i]
# and covariance cov[i].

plt.subplot(2, 1, 1)
plt.plot(t[...,0].ravel(), t[...,1].ravel(), '.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()

# Make another plot, where we generate the samples by passing the given
# means and covs to np.random.multivariate_normal.  This plot should look
# the same as the first plot.
plt.subplot(2, 1, 2)
p0 = np.random.multivariate_normal(means[0], covs[0], size=nsamples)
p1 = np.random.multivariate_normal(means[1], covs[1], size=nsamples)
p2 = np.random.multivariate_normal(means[2], covs[2], size=nsamples)

plt.plot(p0[:,0], p0[:,1], 'b.', alpha=0.02)
plt.plot(p1[:,0], p1[:,1], 'g.', alpha=0.02)
plt.plot(p2[:,0], p2[:,1], 'r.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()

plot

This method might not be any faster that looping over the means and covs arrays and calling multivariate_normal once for each pair (mean, cov). The case where this method would give the most benefit is when you have many different means and covariances and are generating a small number of samples per pair. And even then, it might not be faster, because the script uses a Python loop over the covs array to call sqrtm for each covariance matrix. If performance is critical, test with your actual data.

like image 166
Warren Weckesser Avatar answered Oct 26 '22 19:10

Warren Weckesser