I have a 3D numpy array of shape (t, n1, n2)
:
x = np.random.rand(10, 2, 4)
I need to calculate another 3D array y
which is of shape (t, n1, n1)
such that:
y[0] = np.cov(x[0,:,:])
...and so on for all slices along the first axis.
So, a loopy implementation would be:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
But it didn't work.
cov() function. Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element Cij is the covariance of xi and xj. The element Cii is the variance of xi.
The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.
Introduction to NumPy 3D array. Arrays in NumPy are the data structures with high performance which are suitable for mathematical operations. The three levels of arrays nested inside one another represent the three-dimensional array in python, where each level represents one dimension.
Hacked into numpy.cov source code
and tried using the default parameters. As it turns out, np.cov(x[i,:,:])
would be simply :
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
So, the task was to vectorize this loop that would iterate through i
and process all of the data from x
in one go. For the same, we could use broadcasting
at the third step. For the final step, we are performing sum-reduction
there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum
. Thus, the final implementation came to this -
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Runtime test
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Huge speedup there!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With