I am using Python and Numpy to do some data analysis.
I have a large 3D matrix (NxNxN), where each cell is again a matrix, this time a 3x3 matrix. Calling the matrix data
, it looks like this:
data[N,N,N,3,3]
I need to find the eigenvalues of all the 3x3 matrices, and for that I use Numpy's eigvals
routine, but it takes ages to do. Right now I pretty much do this:
for i in range(N):
for j in range(N):
for k in range(N):
a = np.linalg.eigvals(data[i,j,k,:,:])
For N = 256, this takes about an hour. Any ideas on how to make this more efficient?
Many thanks for any suggestions!
Let's run the program and see what we get. The output may look like the one below. The NumPy version is faster. It took roughly one-hundredth of the time for-loops took.
Traversing PHP Multidimensional Array. Traversing an array means to iterate it starting from the first index till the last element of the array. We can traverse a multidimensional array either using two for loops or two foreach or one for loop and one foreach .
nditer. It is an efficient multidimensional iterator object using which it is possible to iterate over an array. Each element of an array is visited using Python's standard Iterator interface.
itertools.product
is nicer than nested loops, aesthetically speaking. But I don't think it will make your code that much faster. My testing suggests that iteration is not your bottleneck.
>>> bigdata = numpy.arange(256 * 256 * 256 * 3 * 3).reshape(256, 256, 256, 3, 3)
>>> %timeit numpy.linalg.eigvals(bigdata[100, 100, 100, :, :])
10000 loops, best of 3: 52.6 us per loop
So underestimating:
>>> .000052 * 256 * 256 * 256 / 60
14.540253866666665
That's 14 minutes minimum on my computer, which is pretty new. Let's see how long the loops take...
>>> def just_loops(N):
... for i in xrange(N):
... for j in xrange(N):
... for k in xrange(N):
... pass
...
>>> %timeit just_loops(256)
1 loops, best of 3: 350 ms per loop
Orders of magnitude smaller, as DSM said. Even the work of slicing the array alone is more substantial:
>>> def slice_loops(N, data):
... for i in xrange(N):
... for j in xrange(N):
... for k in xrange(N):
... data[i, j, k, :, :]
...
>>> %timeit slice_loops(256, bigdata)
1 loops, best of 3: 33.5 s per loop
I'm sure there is a good way to do this in NumPy, but in general, itertools.product
is faster than nested loops over ranges.
from itertools import product
for i, j, k in product(xrange(N), xrange(N), xrange(N)):
a = np.linalg.eigvals(data[i,j,k,:,:])
since all the calculation are independent, you can use multiprocessing module to speed up the calculation if you have a multi-core processor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With