Using numpy I can simulate unconditionally from a multivariate normal distribution by
mean = [0, 0]
cov = [[1, 0], [0, 100]] # diagonal covariance
x, y = np.random.multivariate_normal(mean, cov, 5000).T
How do I simulate y from the same distribution, given that I have 5000 realizations of x? I'm looking for a generalized solution that can be scaled to an arbitrary dimension.
Looking up in Eaton, Morris L. (1983). Multivariate Statistics: a Vector Space Approach, I gathered following example solution for a 4 vaiable system, with 2 dependent variables (the first two) and 2 independent variables (the last two)
import numpy as np
mean = np.array([1, 2, 3, 4])
cov = np.array(
[[ 1.0, 0.5, 0.3, -0.1],
[ 0.5, 1.0, 0.1, -0.2],
[ 0.3, 0.1, 1.0, -0.3],
[-0.1, -0.2, -0.3, 0.1]]) # diagonal covariance
c11 = cov[0:2, 0:2] # Covariance matrix of the dependent variables
c12 = cov[0:2, 2:4] # Custom array only containing covariances, not variances
c21 = cov[2:4, 0:2] # Same as above
c22 = cov[2:4, 2:4] # Covariance matrix of independent variables
m1 = mean[0:2].T # Mu of dependent variables
m2 = mean[2:4].T # Mu of independent variables
conditional_data = np.random.multivariate_normal(m2, c22, 1000)
conditional_mu = m2 + c12.dot(np.linalg.inv(c22)).dot((conditional_data - m2).T).T
conditional_cov = np.linalg.inv(np.linalg.inv(cov)[0:2, 0:2])
dependent_data = np.array([np.random.multivariate_normal(c_mu, conditional_cov, 1)[0] for c_mu in conditional_mu])
print np.cov(dependent_data.T, conditional_data.T)
>> [[ 1.0012233 0.49592165 0.28053086 -0.08822537]
[ 0.49592165 0.98853341 0.11168755 -0.22584691]
[ 0.28053086 0.11168755 0.91688239 -0.27867207]
[-0.08822537 -0.22584691 -0.27867207 0.94908911]]
which is acceptably close to the pre-defined covariance matrix. The solution is also briefly described on Wikipedia
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With