Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient iteration over 3D array?

Tags:

python

numpy

I am using Python and Numpy to do some data analysis.

I have a large 3D matrix (NxNxN), where each cell is again a matrix, this time a 3x3 matrix. Calling the matrix data, it looks like this:

data[N,N,N,3,3]  

I need to find the eigenvalues of all the 3x3 matrices, and for that I use Numpy's eigvals routine, but it takes ages to do. Right now I pretty much do this:

for i in range(N):
    for j in range(N):
        for k in range(N):
            a = np.linalg.eigvals(data[i,j,k,:,:])

For N = 256, this takes about an hour. Any ideas on how to make this more efficient?

Many thanks for any suggestions!

like image 342
digitaldingo Avatar asked Aug 07 '11 15:08

digitaldingo


People also ask

Is NumPy faster than for loop?

Let's run the program and see what we get. The output may look like the one below. The NumPy version is faster. It took roughly one-hundredth of the time for-loops took.

How will you traverse the multidimensional array?

Traversing PHP Multidimensional Array. Traversing an array means to iterate it starting from the first index till the last element of the array. We can traverse a multidimensional array either using two for loops or two foreach or one for loop and one foreach .

What is NP Nditer?

nditer. It is an efficient multidimensional iterator object using which it is possible to iterate over an array. Each element of an array is visited using Python's standard Iterator interface.


3 Answers

itertools.product is nicer than nested loops, aesthetically speaking. But I don't think it will make your code that much faster. My testing suggests that iteration is not your bottleneck.

>>> bigdata = numpy.arange(256 * 256 * 256 * 3 * 3).reshape(256, 256, 256, 3, 3)
>>> %timeit numpy.linalg.eigvals(bigdata[100, 100, 100, :, :])
10000 loops, best of 3: 52.6 us per loop

So underestimating:

>>> .000052 * 256 * 256 * 256 / 60
14.540253866666665

That's 14 minutes minimum on my computer, which is pretty new. Let's see how long the loops take...

>>> def just_loops(N):
...     for i in xrange(N):
...         for j in xrange(N):
...             for k in xrange(N):
...                 pass
... 
>>> %timeit just_loops(256)
1 loops, best of 3: 350 ms per loop

Orders of magnitude smaller, as DSM said. Even the work of slicing the array alone is more substantial:

>>> def slice_loops(N, data):
...     for i in xrange(N):
...         for j in xrange(N):
...             for k in xrange(N):
...                 data[i, j, k, :, :]
... 
>>> %timeit slice_loops(256, bigdata)
1 loops, best of 3: 33.5 s per loop
like image 134
senderle Avatar answered Oct 05 '22 06:10

senderle


I'm sure there is a good way to do this in NumPy, but in general, itertools.product is faster than nested loops over ranges.

from itertools import product

for i, j, k in product(xrange(N), xrange(N), xrange(N)):
    a = np.linalg.eigvals(data[i,j,k,:,:])
like image 30
agf Avatar answered Oct 05 '22 06:10

agf


since all the calculation are independent, you can use multiprocessing module to speed up the calculation if you have a multi-core processor.

like image 29
HYRY Avatar answered Oct 05 '22 05:10

HYRY