I have a 3D numpy array of shape <code>(t, n1, n2)</code>: <pre class="prettyprint"><code>x = np.random.rand(10, 2, 4) </code></pre> I need to calculate another 3D array <code>y</code> which is of shape <code>(t, n1, n1)</code> such that: <pre class="prettyprint"><code>y[0] = np.cov(x[0,:,:]) </code></pre> ...and so on for all slices along the first axis. So, a loopy implementation would be: <pre class="prettyprint"><code>y = np.zeros((10,2,2)) for i in np.arange(x.shape[0]): y[i] = np.cov(x[i, :, :]) </code></pre> Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing: <pre class="prettyprint"><code>x1 = x.swapaxes(1, 2) y = np.dot(x, x1) </code></pre> But it didn't work.

Hacked into <code>numpy.cov source code</code> and tried using the default parameters. As it turns out, <code>np.cov(x[i,:,:])</code> would be simply : <pre class="prettyprint"><code>N = x.shape[2] m = x[i,:,:] m -= np.sum(m, axis=1, keepdims=True) / N cov = np.dot(m, m.T) /(N - 1) </code></pre> So, the task was to vectorize this loop that would iterate through <code>i</code> and process all of the data from <code>x</code> in one go. For the same, we could use <code>broadcasting</code> at the third step. For the final step, we are performing <code>sum-reduction</code> there along all slices in first axis. This could be efficiently implemented in a vectorized manner with <code>np.einsum</code>. Thus, the final implementation came to this - <pre class="prettyprint"><code>N = x.shape[2] m1 = x - x.sum(2,keepdims=1)/N y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1) </code></pre> Runtime test <pre class="prettyprint"><code>In [155]: def original_app(x): ...: n = x.shape[0] ...: y = np.zeros((n,2,2)) ...: for i in np.arange(x.shape[0]): ...: y[i]=np.cov(x[i,:,:]) ...: return y ...: ...: def proposed_app(x): ...: N = x.shape[2] ...: m1 = x - x.sum(2,keepdims=1)/N ...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1) ...: return out ...: In [156]: # Setup inputs ...: n = 10000 ...: x = np.random.rand(n,2,4) ...: In [157]: np.allclose(original_app(x),proposed_app(x)) Out[157]: True # Results verified In [158]: %timeit original_app(x) 1 loops, best of 3: 610 ms per loop In [159]: %timeit proposed_app(x) 100 loops, best of 3: 6.32 ms per loop </code></pre> Huge speedup there!

Vectorizing NumPy covariance for 3D array

I have a 3D numpy array of shape (t, n1, n2):

x = np.random.rand(10, 2, 4)

I need to calculate another 3D array y which is of shape (t, n1, n1) such that:

y[0] = np.cov(x[0,:,:])

...and so on for all slices along the first axis.

So, a loopy implementation would be:

y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
    y[i] = np.cov(x[i, :, :])

Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:

x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)

But it didn't work.

How does NumPy calculate covariance?

cov() function. Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element Cij is the covariance of xi and xj. The element Cii is the variance of xi.

What does vectorize do in NumPy?

The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.

What is a 3D array in NumPy?

Introduction to NumPy 3D array. Arrays in NumPy are the data structures with high performance which are suitable for mathematical operations. The three levels of arrays nested inside one another represent the three-dimensional array in python, where each level represents one dimension.

Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :

N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T)  /(N - 1)

So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -

N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)

Runtime test

In [155]: def original_app(x):
     ...:     n = x.shape[0]
     ...:     y = np.zeros((n,2,2))
     ...:     for i in np.arange(x.shape[0]):
     ...:         y[i]=np.cov(x[i,:,:])
     ...:     return y
     ...: 
     ...: def proposed_app(x):
     ...:     N = x.shape[2]
     ...:     m1 = x - x.sum(2,keepdims=1)/N
     ...:     out = np.einsum('ijk,ilk->ijl',m1,m1)  / (N - 1)
     ...:     return out
     ...: 

In [156]: # Setup inputs
     ...: n = 10000
     ...: x = np.random.rand(n,2,4)
     ...: 

In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True  # Results verified

In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop

In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop

Huge speedup there!

Vectorizing NumPy covariance for 3D array

Tags:

python

multidimensional-array

vectorization

numpy

covariance

dayum

People also ask

1 Answers

Divakar

Recent Activity

Donate For Us

Vectorizing NumPy covariance for 3D array

Tags:

python

multidimensional-array

vectorization

numpy

covariance

dayum

People also ask

1 Answers

Divakar

Related questions

Recent Activity

Donate For Us