Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I make random mask with Numpy?

I'm doing image processing using Python.

I am trying to randomly extract some pixels from the image.

Is it impossible to make random mask with Numpy?

What I'm thinking now is to make 1000 elements of the 10000 line array True and all else False, is it possible to realize this?

Also, if impossible, is there any other way to make a random mask? Thank you.

like image 680
HAL Avatar asked Dec 22 '17 11:12

HAL


2 Answers

Create an array of False values, set the first 1000 elements them to True:

a = np.full(10000, False)
a[:1000] = True

Afterwards simply shuffle the array

np.random.shuffle(a)

For a slightly faster solution you can also create an array of integer zeros, set some values to 1, shuffle and cast it to bool:

a = np.zeros(10000, dtype=int)
a[:1000] = 1
np.random.shuffle(a)
a = a.astype(bool)

In both cases you will have an array a with exactly 1000 True elements at random positions.

If instead you want each element to be individually picked from [True, False] you could use

np.random.choice([True, False], size=10000, p=[0.1, 0.9])

but note you cannot predict the number of True elements in your array. You'll just know that on average you'll have 1000 of them.

like image 191
Nils Werner Avatar answered Sep 19 '22 05:09

Nils Werner


A common solution is creating an array of random integer indices, which can be efficiently done with numpy's random choice.

With this setup:

n_dim = 10_000  # size of the original array
n = 100         # size of the random mask
rng = np.random.default_rng(123)

To create the array of random index we can use numpy's choice passing the array size as first argument:

In [5]: %%timeit  
   ...: m = rng.choice(n_dim, replace=False, size=n) 
   ...:  
   ...:                                                                                                                                  
21.9 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As a comparison, the boolean array approach mentioned in other answers (which requires shuffling an array of 0 and 1s) is quite slower (>10x slower in this example):

In [7]: %%timeit 
   ...: m = np.hstack([np.ones(n, dtype=bool), np.zeros(n_dim - n, dtype=bool)]) 
   ...: rng.shuffle(m) 
   ...:  
   ...:                                                                                                                                  
261 µs ± 604 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

NOTE: The integer indexing works best in the sparse case, i.e. when selecting a small fraction of samples from the original array. In this case the RAM usage of an integer index would be much lower than a boolean mask. When the fraction of samples becomes more than 10..20% of the original array the bool mask approach would be more efficient.

NOTE2 The integer indexing will return samples in random order. In order to random sample an array while maintaining the order you need to sort the index. The bool mask would naturally return sorted samples.

To conclude, if you are performing sparse sampling and you don't care about order of the sampled items, the integer indexing shown here is likely to outperform other approaches.

like image 43
user2304916 Avatar answered Sep 21 '22 05:09

user2304916