Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to extract given rows and columns from a Numpy ndarray?

I have a large (approx. 14,000 x 14,000) square matrix represented as a Numpy ndarray. I wish to extract a large number of rows and columns--the indices of which I know in advance, though it will in fact be all rows and columns that are not all-zero--to get a new square matrix (approx 10,000 x 10,000).

The fastest way I have found to do this is:

> timeit A[np.ix_(indices, indices)]
1 loops, best of 3: 6.19 s per loop

However, this is much slower than the time it takes to do matrix multiplication:

> timeit np.multiply(A, A)
1 loops, best of 3: 982 ms per loop

This seems strange, since both the row/column extraction and matrix multiplication need to allocate a new array (which will be even larger for the result of the matrix multiplication than for the extraction), but matrix multiplication also needs to perform additional computation.

Thus, the question: is there a more efficient way to perform the extraction, in particular, that is at least as fast as matrix multiplication?

like image 363
jveldridge Avatar asked Aug 29 '14 19:08

jveldridge


People also ask

How do I get rows and columns of a NumPy array?

In the NumPy with the help of shape() function, we can find the number of rows and columns. In this function, we pass a matrix and it will return row and column number of the matrix. Return: The number of rows and columns.

Is NumPy indexing fast?

Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array. Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.

How do you access different rows of a multidimensional NumPy array?

In NumPy , it is very easy to access any rows of a multidimensional array. All we need to do is Slicing the array according to the given conditions. Whenever we need to perform analysis, slicing plays an important role.

How do I select a specific column from a NumPy array very important?

Select a single element from 2D Numpy Array by index We can use [][] operator to select an element from Numpy Array i.e. Example 1: Select the element at row index 1 and column index 2. Or we can pass the comma separated list of indices representing row index & column index too i.e.


1 Answers

If I try to reproduce your problem, I don't see such a drastic effect. I notice that depending on how many indices you choose, the indexing can even be faster than the multiplication.

>>> import numpy as np
>>> np.__version__
Out[1]: '1.9.0'
>>> N = 14000
>>> A = np.random.random(size=[N, N])

>>> indices = np.sort(np.random.choice(np.arange(N), 0.9*N, replace=False))
>>> timeit A[np.ix_(indices, indices)]
1 loops, best of 3: 1.02 s per loop
>>> timeit A.take(indices, axis=0).take(indices, axis=1)
1 loops, best of 3: 1.37 s per loop
>>> timeit np.multiply(A,A)
1 loops, best of 3: 748 ms per loop

>>> indices = np.sort(np.random.choice(np.arange(N), 0.7*N, replace=False))
>>> timeit A[np.ix_(indices, indices)]
1 loops, best of 3: 633 ms per loop
>>> timeit A.take(indices, axis=0).take(indices, axis=1)
1 loops, best of 3: 946 ms per loop
>>> timeit np.multiply(A,A)
1 loops, best of 3: 728 ms per loop
like image 100
physicalattraction Avatar answered Oct 16 '22 13:10

physicalattraction