Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorizing NumPy covariance for 3D array

I have a 3D numpy array of shape (t, n1, n2):

x = np.random.rand(10, 2, 4)

I need to calculate another 3D array y which is of shape (t, n1, n1) such that:

y[0] = np.cov(x[0,:,:])

...and so on for all slices along the first axis.

So, a loopy implementation would be:

y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
    y[i] = np.cov(x[i, :, :])

Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:

x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)

But it didn't work.

like image 735
dayum Avatar asked Nov 03 '16 05:11

dayum


People also ask

How does NumPy calculate covariance?

cov() function. Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element Cij is the covariance of xi and xj. The element Cii is the variance of xi.

What does vectorize do in NumPy?

The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.

What is a 3D array in NumPy?

Introduction to NumPy 3D array. Arrays in NumPy are the data structures with high performance which are suitable for mathematical operations. The three levels of arrays nested inside one another represent the three-dimensional array in python, where each level represents one dimension.


1 Answers

Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :

N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T)  /(N - 1)

So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -

N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)

Runtime test

In [155]: def original_app(x):
     ...:     n = x.shape[0]
     ...:     y = np.zeros((n,2,2))
     ...:     for i in np.arange(x.shape[0]):
     ...:         y[i]=np.cov(x[i,:,:])
     ...:     return y
     ...: 
     ...: def proposed_app(x):
     ...:     N = x.shape[2]
     ...:     m1 = x - x.sum(2,keepdims=1)/N
     ...:     out = np.einsum('ijk,ilk->ijl',m1,m1)  / (N - 1)
     ...:     return out
     ...: 

In [156]: # Setup inputs
     ...: n = 10000
     ...: x = np.random.rand(n,2,4)
     ...: 

In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True  # Results verified

In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop

In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop

Huge speedup there!

like image 108
Divakar Avatar answered Oct 19 '22 06:10

Divakar