Today I profiled a function and I found a (at least to me) weird bottleneck: Creating a masked array with mask=None
or mask=0
to initialize a mask with all zeros but the same shape as the data
is very slow:
>>> import numpy as np
>>> data = np.ones((100, 100, 100))
>>> %timeit ma_array = np.ma.array(data, mask=None, copy=False)
1 loop, best of 3: 803 ms per loop
>>> %timeit ma_array = np.ma.array(data, mask=0, copy=False)
1 loop, best of 3: 807 ms per loop
on the other hand using mask=False
or creating the mask by hand is much faster:
>>> %timeit ma_array = np.ma.array(data, mask=False, copy=False)
1000 loops, best of 3: 438 µs per loop
>>> %timeit ma_array = np.ma.array(data, mask=np.zeros(data.shape, dtype=bool), copy=False)
1000 loops, best of 3: 453 µs per loop
Why is giving None
or 0
almost 2000 times slower than False
or np.zeros(data.shape)
as mask
parameter? Given that the function docs only says that it:
Must be convertible to an array of booleans with the same shape as data. True indicates a masked (i.e. invalid) data.
I use python 3.5, numpy 1.11.0 on Windows 10
The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.
A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
According to the documentation for numpy. delete, the function returns a copy of the input array with the specified elements removed. So the larger the array you're copying, the slower the function will be.
mask=False
is special-cased in the NumPy 1.11.0 source code:
if mask is True and mdtype == MaskType:
mask = np.ones(_data.shape, dtype=mdtype)
elif mask is False and mdtype == MaskType:
mask = np.zeros(_data.shape, dtype=mdtype)
mask=0
or mask=None
take the slow path, making a 0-dimensional mask array and going through np.resize
to resize it.
I believe @user2357112 has the explanation. I profiled both cases, here are the results:
In [14]: q.run('q.np.ma.array(q.data, mask=None, copy=False)')
49 function calls in 0.161 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
3 0.000 0.000 0.000 0.000 :0(array)
1 0.154 0.154 0.154 0.154 :0(concatenate)
1 0.000 0.000 0.161 0.161 :0(exec)
11 0.000 0.000 0.000 0.000 :0(getattr)
1 0.000 0.000 0.000 0.000 :0(hasattr)
7 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(len)
1 0.000 0.000 0.000 0.000 :0(ravel)
1 0.000 0.000 0.000 0.000 :0(reduce)
1 0.000 0.000 0.000 0.000 :0(reshape)
1 0.000 0.000 0.000 0.000 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(update)
1 0.000 0.000 0.161 0.161 <string>:1(<module>)
1 0.000 0.000 0.161 0.161 core.py:2704(__new__)
1 0.000 0.000 0.000 0.000 core.py:2838(_update_from)
1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__)
5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__)
1 0.000 0.000 0.161 0.161 core.py:6119(array)
1 0.007 0.007 0.161 0.161 fromnumeric.py:1097(resize)
1 0.000 0.000 0.000 0.000 fromnumeric.py:128(reshape)
1 0.000 0.000 0.000 0.000 fromnumeric.py:1383(ravel)
1 0.000 0.000 0.000 0.000 numeric.py:484(asanyarray)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 0.161 0.161 profile:0(q.np.ma.array(q.data, mask=None, copy=False))
In [15]: q.run('q.np.ma.array(q.data, mask=False, copy=False)')
37 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(array)
1 0.000 0.000 0.000 0.000 :0(exec)
11 0.000 0.000 0.000 0.000 :0(getattr)
1 0.000 0.000 0.000 0.000 :0(hasattr)
5 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(update)
1 0.000 0.000 0.000 0.000 :0(zeros)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 core.py:2704(__new__)
1 0.000 0.000 0.000 0.000 core.py:2838(_update_from)
1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__)
5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__)
1 0.000 0.000 0.000 0.000 core.py:6119(array)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 0.000 0.000 profile:0(q.np.ma.array(q.data, mask=False, copy=False))
So it seems that the concatenation step of arrays is the bottleneck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With