Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently reshape numpy array

I am working with NumPy arrays.

I have a 2N length vector D and want to reshape part of it into an N x N array C.

Right now this code does what I want, but is a bottleneck for larger N:

```

import numpy as np
M = 1000
t = np.arange(M)
D = np.sin(t)    # initial vector is a sin() function
N = M / 2
C = np.zeros((N,N))
for a in xrange(N):
    for b in xrange(N):
        C[a,b] = D[N + a - b]

```

Once C is made I go ahead and do some matrix arithmetic on it, etc.

This nested loop is pretty slow, but since this operation is essentially a change in indexing I figured that I could use NumPy's builtin reshape (numpy.reshape) to speed this part up.

Unfortunately, I cannot seem to figure out a good way of transforming these indices.

Any help speeding this part up?

like image 845
jjgoings Avatar asked Jun 14 '16 19:06

jjgoings


People also ask

Is reshape slow Numpy?

And the Numpy code runs on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The results show that with the large D, the Jax-version reshape is slower than Numpy and becomes slower and slower in the following steps.

What is the meaning of reshape (- 1 1?

In NumPy, -1 in reshape(-1) refers to an unknown dimension that the reshape() function calculates for you. It is like saying: “I will leave this dimension for the reshape() function to determine”. A common use case is to flatten a nested array of an unknown number of elements to a 1D array.

Which Numpy method is used to change the shape of array?

Reshaping means changing the shape of an array. The shape of an array is the number of elements in each dimension. By reshaping we can add or remove dimensions or change number of elements in each dimension.


1 Answers

You can use NumPy broadcasting to remove those nested loops -

C = D[N + np.arange(N)[:,None] - np.arange(N)]

One can also use np.take to replace the indexing, like so -

C = np.take(D,N + np.arange(N)[:,None] - np.arange(N))

A closer look reveals the pattern to be close to toeplitz and hankel matrices. So, using those, we would have two more approaches to solve it, though with comparable speedups as with broadcasting. The implementations would look something like these -

from scipy.linalg import toeplitz
from scipy.linalg import hankel

C = toeplitz(D[N:],np.hstack((D[0],D[N-1:0:-1])))
C = hankel(D[1:N+1],D[N:])[:,::-1]

Runtime test

In [230]: M = 1000
     ...: t = np.arange(M)
     ...: D = np.sin(t)    # initial vector is a sin() function
     ...: N = M / 2
     ...: 

In [231]: def org_app(D,N):
     ...:     C = np.zeros((N,N))
     ...:     for a in xrange(N):
     ...:         for b in xrange(N):
     ...:             C[a,b] = D[N + a - b]
     ...:     return C
     ...: 

In [232]: %timeit org_app(D,N)
     ...: %timeit D[N + np.arange(N)[:,None] - np.arange(N)]
     ...: %timeit np.take(D,N + np.arange(N)[:,None] - np.arange(N))
     ...: %timeit toeplitz(D[N:],np.hstack((D[0],D[N-1:0:-1])))
     ...: %timeit hankel(D[1:N+1],D[N:])[:,::-1]
     ...: 
10 loops, best of 3: 83 ms per loop
100 loops, best of 3: 2.82 ms per loop
100 loops, best of 3: 2.84 ms per loop
100 loops, best of 3: 2.95 ms per loop
100 loops, best of 3: 2.93 ms per loop
like image 68
Divakar Avatar answered Oct 11 '22 15:10

Divakar