Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compress numpy array(matrix) by removing columns using another numpy array as mask

Tags:

python

numpy

I have a 2D numpy array (i.e matrix) A which contains useful data interspread with garbage in the form of column vectors as well as a 'selection' array B which contains '1' for those columns that are important and 0 for those that are not. Is there a way to select only those columns from A that correspond to ones in B? i.e i have a matrix

A = array([[ 0,  1,  2,  3,  4],   and a vector B = array([ 0,  1,  0,  1,  0])
           [ 5,  6,  7,  8,  9],
           [10, 11, 12, 13, 14],
           [15, 16, 17, 18, 19],
           [20, 21, 22, 23, 24]])

and I want

array([[1,   3],
       [6,   8],
       [11, 13],
       [16, 18],
       [21, 23]])

Is there an elegant way to do so? Right now i just have a for loop that iterates through B.

NOTE: the matrices that i'm dealing with are large, so i don't want to use numpy masked arrays, as i simply don't want the masked data

like image 817
pratikm Avatar asked Dec 10 '11 00:12

pratikm


People also ask

How do you remove a column from a NumPy array?

Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.).

What is a NumPy masked array?

A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.


2 Answers

>>> A
  array([[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19],
         [20, 21, 22, 23, 24]])
>>> B = NP.array([ 0,  1,  0,  1,  0])

>>> # convert the indexing array to a boolean array
>>> B = NP.array(B, dtype=bool)

>>> # index A against B--indexing array is placed after the ',' because
>>> # you are selecting columns

>>> res = A[:,B]

>>> res
  array([[ 1,  3],
         [ 6,  8],
         [11, 13],
         [16, 18],
         [21, 23]])  


The syntax for index-based slicing in NumPy is elegant and simple. A couple of rules cover a majority of use cases:

  • the form is [rows, columns]

  • specify all rows or all columns using a colon ":" e.g., [:, 4] (extracts the entire 5th column)

like image 102
doug Avatar answered Sep 23 '22 04:09

doug


Not sure if it's the most efficient way (because of the transposition), but it should be better than a for loop:

A.T[B == 1].T
like image 21
David Z Avatar answered Sep 22 '22 04:09

David Z