I'm doing image processing using Python.
I am trying to randomly extract some pixels from the image.
Is it impossible to make random mask with Numpy?
What I'm thinking now is to make 1000 elements of the 10000 line array True and all else False, is it possible to realize this?
Also, if impossible, is there any other way to make a random mask? Thank you.
Create an array of False
values, set the first 1000
elements them to True
:
a = np.full(10000, False)
a[:1000] = True
Afterwards simply shuffle the array
np.random.shuffle(a)
For a slightly faster solution you can also create an array of integer zeros, set some values to 1
, shuffle and cast it to bool
:
a = np.zeros(10000, dtype=int)
a[:1000] = 1
np.random.shuffle(a)
a = a.astype(bool)
In both cases you will have an array a
with exactly 1000 True
elements at random positions.
If instead you want each element to be individually picked from [True, False]
you could use
np.random.choice([True, False], size=10000, p=[0.1, 0.9])
but note you cannot predict the number of True
elements in your array. You'll just know that on average you'll have 1000 of them.
A common solution is creating an array of random integer indices, which can be efficiently done with numpy's random choice
.
With this setup:
n_dim = 10_000 # size of the original array
n = 100 # size of the random mask
rng = np.random.default_rng(123)
To create the array of random index we can use numpy's choice
passing the array size as first argument:
In [5]: %%timeit
...: m = rng.choice(n_dim, replace=False, size=n)
...:
...:
21.9 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As a comparison, the boolean array approach mentioned in other answers (which requires shuffling an array of 0 and 1s) is quite slower (>10x slower in this example):
In [7]: %%timeit
...: m = np.hstack([np.ones(n, dtype=bool), np.zeros(n_dim - n, dtype=bool)])
...: rng.shuffle(m)
...:
...:
261 µs ± 604 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
NOTE: The integer indexing works best in the sparse case, i.e. when selecting a small fraction of samples from the original array. In this case the RAM usage of an integer index would be much lower than a boolean mask. When the fraction of samples becomes more than 10..20% of the original array the bool mask approach would be more efficient.
NOTE2 The integer indexing will return samples in random order. In order to random sample an array while maintaining the order you need to sort the index. The bool mask would naturally return sorted samples.
To conclude, if you are performing sparse sampling and you don't care about order of the sampled items, the integer indexing shown here is likely to outperform other approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With