Is there anything I can do to speed up masked arrays in numpy? I had a terribly inefficient function that I re-wrote to use masked arrays (where I could just mask rows instead of make copies and delete rows as I was doing). However, I was shocked to find that the masked function was 10x slower because the masked arrays are so much slower.
As an example, take the following (masked is more then 6 times slower for me):
import timeit
import numpy as np
import numpy.ma as ma
def test(row):
return row[0] + row[1]
a = np.arange(1000).reshape(500, 2)
t = timeit.Timer('np.apply_along_axis(test, 1, a)','from __main__ import test, a, np')
print round(t.timeit(100), 6)
b = ma.array(a)
t = timeit.Timer('ma.apply_along_axis(test, 1, b)','from __main__ import test, b, ma')
print round(t.timeit(100), 6)
The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs). This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.
NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
I have no idea why the masked array functions are moving so slowly, but since it sounds like you are using the mask to select rows (as opposed to individual values), you can create a regular array from the masked rows and use the np function instead:
b.mask = np.zeros(500)
b.mask[498] = True
t = timeit.Timer('c=b.view(np.ndarray)[~b.mask[:,0]]; np.apply_along_axis(test, 1, c)','from __main__ import test, b, ma, np')
print round(t.timeit(100), 6)
Better yet, don't use masked arrays at all; just maintain your data and a 1D mask array as separate variables:
a = np.arange(1000).reshape(500, 2)
mask = np.ones(a.shape[0], dtype=bool)
mask[498] = False
out = np.apply_along_axis(test, 1, a[mask])
The most efficient way I am aware of is to handle the mask manually. Here a short benchmark for calculating a masked mean
along an axis. As of 2021 (np.version 1.19.2) the manual implementation is 3x faster.
It is worth noting, that
np.nanmean
is as slow as ma.mean
. However, I did not find an easy workaround for that, as 0 * nan -> nan
and np.where
is time consuming.opencv
usually has a mask argument for its routines. But switching library might not be suitable in most cases.benchmark manual (np.sum(..values..)/np.sum(..counts..))
time for 100x np_mean: 0.15721
benchmark ma.mean
time for 100x ma_mean: 0.580072
benchmark np.nanmean
time for 100x nan_mean: 0.609166
np_mean[:5]: [0.74468436 0.75447124 0.75628326 0.74990387 0.74708414]
ma_mean[:5]: [0.7446843592460088 0.7544712410870448 0.7562832614361736
0.7499038657880674 0.747084143818861]
nan_mean[:5]: [0.74468436 0.75447124 0.75628326 0.74990387 0.74708414]
np_mean == ma_mean: True
np_mean == nan_mean: True
np.__version__: 1.19.2
import timeit
import numpy as np
import numpy.ma as ma
np.random.seed(0)
arr = np.random.rand(1000, 1000)
msk = arr > .5 # POSITIV mask: only emelemts > .5 are processed
print('\nbenchmark manual (np.sum(..values..)/np.sum(..counts..))')
np_mean = np.sum(arr * msk, axis=0)/np.sum(msk, axis=0)
t = timeit.Timer('np_mean = np.sum(arr * msk, axis=0)/np.sum(msk, axis=0)', globals=globals())
print('\ttime for 100x np_mean:', round(t.timeit(100), 6))
print('\nbenchmark ma.mean')
ma_arr = ma.masked_array(arr, mask=~msk)
ma_mean = ma.mean(ma_arr, axis=0)
t = timeit.Timer('ma_mean = ma.mean(ma_arr, axis=0)', globals=globals())
print('\ttime for 100x ma_mean:', round(t.timeit(100), 6))
print('\nbenchmark np.nanmean')
nan_arr = arr.copy()
nan_arr[~msk] = np.nan
nan_mean = np.nanmean(nan_arr, axis=0)
t = timeit.Timer('nan_mean = np.nanmean(nan_arr, axis=0)', globals=globals())
print('\ttime for 100x nan_mean:', round(t.timeit(100), 6))
print('\n')
print('np_mean[:5]:', np_mean[:5])
print('ma_mean[:5]:', ma_mean[:5])
print('nan_mean[:5]:', nan_mean[:5])
print('np_mean == ma_mean: ', (np_mean == ma_mean).all())
print('np_mean == nan_mean: ', (np_mean == nan_mean).all())
print('np.__version__:', np.__version__)
The manaul Version only works if there are no nans
in the array.
If arr
contains nans
:
Just construct the mask by msk = np.isnan(arr)
and afterwards replace the nans in arr
by arr = np.nan_to_num(arr, copy=False, nan=0)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With