Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mask 2D array preserving shape

I have 2D numpy array something like this:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

and a boolean array:

boolarr = np.array([[True, True, False],
                    [False, False, True],
                    [True, True,True]])

Now, when I try to slice arr based on boolarr, it gives me

arr[boolarr]

Output:

array([1, 2, 1, 1, 2, 3])

But I am looking to have a 2D array output instead. The desired output is

[[1, 2],
 [1],
 [1, 2, 3]]
like image 312
user3788040 Avatar asked Dec 24 '18 23:12

user3788040


2 Answers

An option using numpy is to start by adding up rows in the mask:

take = boolarr.sum(axis=1)
#array([2, 1, 3])

Then mask the array as you do:

x = arr[boolarr]
#array([1, 2, 1, 1, 2, 3])

And use np.split to split the flat array according to the np.cumsum of take (as the function expects the indices where to split the array):

np.split(x, np.cumsum(take)[:-1])
[array([1, 2]), array([1]), array([1, 2, 3])]

General solution

def mask_nd(x, m):
    '''
    Mask a 2D array and preserve the
    dimension on the resulting array
    ----------
    x: np.array
       2D array on which to apply a mask
    m: np.array
        2D boolean mask  
    Returns
    -------
    List of arrays. Each array contains the
    elements from the rows in x once masked.
    If no elements in a row are selected the 
    corresponding array will be empty
    '''
    take = m.sum(axis=1)
    return np.split(x[m], np.cumsum(take)[:-1])

Examples

Lets have a look at some examples:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

boolarr = np.array([[True, True, False],
                    [False, False, False],
                    [True, True,True]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([], dtype=int32), array([1, 2, 3])]

Or for the following arrays:

arr = np.array([[1,2],
                [2,1]])

boolarr = np.array([[True, True],
                    [True, False]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([2])]
like image 137
yatu Avatar answered Nov 03 '22 03:11

yatu


Your desired output is not a 2D array, since each "row" has a different number of "columns". A functional non-vectorised solution is possible via itertools.compress:

from itertools import compress

res = list(map(list, map(compress, arr, boolarr)))

# [[1, 2], [1], [1, 2, 3]]
like image 2
jpp Avatar answered Nov 03 '22 04:11

jpp