Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiple levels of boolean index mask in NumPy

Tags:

python

numpy

I have the following code which first selects elements of a NumPy array with a logical index mask:

import numpy as np

grid = np.random.rand(4,4) 
mask = grid > 0.5

I wish to use a second boolean mask against this one to pick out objects with :

masklength = len(grid[mask])
prob = 0.5
# generates an random array of bools
second_mask = np.random.rand(masklength) < prob 

# this fails to act on original object
grid[mask][second_mask] = 100

This is not quite the same problem as listed in this SO question: Numpy array, how to select indices satisfying multiple conditions? - as I am using random number generation, I don't want to have to generate a full mask, only for the elements selected by the first mask.

like image 496
Hemmer Avatar asked Aug 24 '11 17:08

Hemmer


3 Answers

Using flat indexing avoids much of the headache:

grid.flat[np.flatnonzero(mask)[second_mask]] = 100

Breaking it down:

ind = np.flatnonzero(mask)

generates a flat array of indices where mask is true, which is then decimated further by applying second_mask:

ind = ind[second_mask] 

We could go on:

ind = ind[third_mask]

Finally

grid.flat[ind] = 100

indexes a flat version of grid with ind and assigns 100. grid.ravel()[ind] = 100 would also work, since ravel() returns a flat view into the original array.

like image 113
Stefan Avatar answered Oct 24 '22 06:10

Stefan


I believe the following does what you're asking:

grid[[a[second_mask] for a in np.where(mask)]] = 100

It works as follows:

  • np.where(mask) converts the boolean mask into the indices where mask is True;
  • [a[second_mask] for a in ...] subsets the indices to only select those where second_mask is True.

The reason your original version doesn't work is that grid[mask] involves fancy indexing. This creates a copy of the data, which in turn results in ...[second_mask] = 100 modifying that copy rather than the original array.

like image 7
NPE Avatar answered Oct 24 '22 04:10

NPE


Another possible solution which I came up with after thinking about this a bit more is to have the second map retain the size of the first (which may or may not be worth the memory hit) and selectively add in the new elements:

#!/usr/bin/env python
import numpy as np

prob = 0.5    
grid = np.random.rand(4,4)

mask = grid > 0.5 
masklength = np.sum(mask)

# initialise with false map
second_mask = np.zeros((4,4), dtype=np.bool)
# then selectively add to this map using the second criteria
second_mask[mask] = np.random.rand(masklength) < prob

# this now acts on the original object
grid[second_mask] = 100

Though this is a bit longer, it seems to read better (to my beginner eyes), and in speed tests it performs in the same time.

like image 1
Hemmer Avatar answered Oct 24 '22 06:10

Hemmer