I am trying to use numpy.random.multivariate_normal
to generate multiple samples where each sample is drawn from a multivariate Normal distribution with a different mean
and cov
. For example, if I would like to draw 2 samples, I tried
from numpy import random as rand
means = np.array([[-1., 0.], [1., 0.]])
covs = np.array([np.identity(2) for k in xrange(2)])
rand.multivariate_normal(means, covs)
but this results in ValueError: mean must be 1 dimensional
. Do I have to do a for loop for this? I thought that for functions like rand.binomial
this is possible.
As @hpaulj suggested, you can generate samples from the standard multivariate normal distribution, and then use, say, einsum
and/or broadcasting to transform the samples. The scaling is done by multiplying the standard sample points by the square root of the covariance matrix. In the following, I use scipy.linalg.sqrtm
to compute the matrix square root, and numpy.einsum
to do the matrix multiplication.
import numpy as np
from scipy.linalg import sqrtm
import matplotlib.pyplot as plt
# Sequence of means
means = np.array([[-15., 0.], [15., 0.], [0., 0.]])
# Sequence of covariance matrices. Must be the same length as means.
covs = np.array([[[ 3, -1],
[-1, 2]],
[[ 1, 2],
[ 2, 5]],
[[ 1, 0],
[ 0, 1]]])
# Number of samples to generate for each (mean, cov) pair.
nsamples = 4000
# Compute the matrix square root of each covariance matrix.
sqrtcovs = np.array([sqrtm(c) for c in covs])
# Generate samples from the standard multivariate normal distribution.
dim = len(means[0])
u = np.random.multivariate_normal(np.zeros(dim), np.eye(dim),
size=(len(means), nsamples,))
# u has shape (len(means), nsamples, dim)
# Transform u.
v = np.einsum('ijk,ikl->ijl', u, sqrtcovs)
m = np.expand_dims(means, 1)
t = v + m
# t also has shape (len(means), nsamples, dim).
# t[i] holds the nsamples sampled from the distribution with mean means[i]
# and covariance cov[i].
plt.subplot(2, 1, 1)
plt.plot(t[...,0].ravel(), t[...,1].ravel(), '.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()
# Make another plot, where we generate the samples by passing the given
# means and covs to np.random.multivariate_normal. This plot should look
# the same as the first plot.
plt.subplot(2, 1, 2)
p0 = np.random.multivariate_normal(means[0], covs[0], size=nsamples)
p1 = np.random.multivariate_normal(means[1], covs[1], size=nsamples)
p2 = np.random.multivariate_normal(means[2], covs[2], size=nsamples)
plt.plot(p0[:,0], p0[:,1], 'b.', alpha=0.02)
plt.plot(p1[:,0], p1[:,1], 'g.', alpha=0.02)
plt.plot(p2[:,0], p2[:,1], 'r.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()
This method might not be any faster that looping over the means
and covs
arrays and calling multivariate_normal
once for each pair (mean, cov). The case where this method would give the most benefit is when you have many different means and covariances and are generating a small number of samples per pair. And even then, it might not be faster, because the script uses a Python loop over the covs
array to call sqrtm
for each covariance matrix. If performance is critical, test with your actual data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With