I have the following code in python (numpy array or scipy.sparse.matrices), it works:
X[a,:][:,b]
But it doesn't look elegant. 'a' and 'b' are 1-D boolean mask.
'a' has the same length as X.shape[0] and 'b' has the same length as X.shape[1]
I tried X[a,b]
but it doesn't work.
What I am trying to accomplish is to select particular rows and columns at the same time. For example, select row 0,7,8 then from that result select all rows from column 2,3,4
How would you make this shorter and more elegant?
You could use np.ix_
for such a broadcasted indexing
, like so -
X[np.ix_(a,b)]
Though this won't be any shorter than the original code, but hopefully should be faster. This is because we are avoiding the intermediate output as with the original code that created X[a,:]
with one slicing and then another slicing X[a,:][:,b]
to give us the final output.
Also, this method would work for a
and b
as both int
and boolean
arrays.
Sample run
In [141]: X = np.random.randint(0,99,(6,5))
In [142]: m,n = X.shape
In [143]: a = np.in1d(np.arange(m),np.random.randint(0,m,(m)))
In [144]: b = np.in1d(np.arange(n),np.random.randint(0,n,(n)))
In [145]: X[a,:][:,b]
Out[145]:
array([[17, 81, 64],
[87, 16, 54],
[98, 22, 11],
[26, 54, 64]])
In [146]: X[np.ix_(a,b)]
Out[146]:
array([[17, 81, 64],
[87, 16, 54],
[98, 22, 11],
[26, 54, 64]])
Runtime test
In [147]: X = np.random.randint(0,99,(600,500))
In [148]: m,n = X.shape
In [149]: a = np.in1d(np.arange(m),np.random.randint(0,m,(m)))
In [150]: b = np.in1d(np.arange(n),np.random.randint(0,n,(n)))
In [151]: %timeit X[a,:][:,b]
1000 loops, best of 3: 1.74 ms per loop
In [152]: %timeit X[np.ix_(a,b)]
1000 loops, best of 3: 1.24 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With