I was able to optimise some operations in my program quite a bit using numpy
. When I profile a run, I noticed that most of the time is spent in numpy.nan_to_num
. I'd like to improve this even further.
The sort of calculations occurring are multiplication of two arrays for which one of the arrays could contain nan
values. I want these to be treated as zeros, but I can't initialise the array with zeros, as nan
has a meaning later on and can't be set to 0. Is there a way of doing multiplications (and additions) with nan
being treated as zero?
From the nan_to_num
docstring, I can see a new array is produced which may explain why it's taking so long.
Replace nan with zero and inf with finite numbers.
Returns an array or scalar replacing Not a Number (NaN) with zero,...
A function like nansum
for arbitrary arithmetic operations would be great.
Here's some example data:
import numpy as np
a = np.random.rand(1000, 1000)
a[a < 0.1] = np.nan # set some random values to nan
b = np.ones_like(a)
One option is to use np.where
to set the value of the result to 0 wherever one of your arrays is equal to NaN:
result = np.where(np.isnan(a), 0, a * b)
If you have to do several operations on an array that contains NaNs, you might consider using masked arrays, which provide a more general method for dealing with missing or invalid values:
masked_a = np.ma.masked_invalid(a)
result2 = masked_a * b
Here, result2
is another np.ma.masked_array
whose .mask
attribute is set according to where the NaN values were in a
. To convert this back to a normal np.ndarray
with the masked values replaced by 0s, you can use the .filled()
method, passing in the fill value of your choice:
result_filled = result2.filled(0)
assert np.all(result_filled == result)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With