Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asymmetric slicing python

Tags:

Consider the following matrix:

X = np.arange(9).reshape(3,3)
     array([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]]) 

Let say I want to subset the following array

array([[0, 4, 2],
       [3, 7, 5]])

It is possible with some indexing of rows and columns, for instance

col=[0,1,2] 
row = [[0,1],[1,2],[0,1]]

Then if I store the result in a variable array I can do it with the following code:

array=np.zeros([2,3],dtype='int64')
for i in range(3):
    array[:,i]=X[row[i],col[i]]

Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.

like image 673
jmamath Avatar asked Apr 01 '18 15:04

jmamath


1 Answers

For arrays with NumPy's advanced-indexing, it would be -

X[row, np.asarray(col)[:,None]].T

Sample run -

In [9]: X
Out[9]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [10]: col=[0,1,2] 
    ...: row = [[0,1],[1,2],[0,1]]

In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]: 
array([[0, 4, 2],
       [3, 7, 5]])
like image 157
Divakar Avatar answered Oct 12 '22 23:10

Divakar