Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: multiplying with NaN values without using nan_to_num

I was able to optimise some operations in my program quite a bit using numpy. When I profile a run, I noticed that most of the time is spent in numpy.nan_to_num. I'd like to improve this even further.

The sort of calculations occurring are multiplication of two arrays for which one of the arrays could contain nan values. I want these to be treated as zeros, but I can't initialise the array with zeros, as nan has a meaning later on and can't be set to 0. Is there a way of doing multiplications (and additions) with nan being treated as zero?

From the nan_to_num docstring, I can see a new array is produced which may explain why it's taking so long.

Replace nan with zero and inf with finite numbers.

Returns an array or scalar replacing Not a Number (NaN) with zero,...

A function like nansum for arbitrary arithmetic operations would be great.

like image 901
orange Avatar asked Sep 20 '25 17:09

orange


1 Answers

Here's some example data:

import numpy as np

a = np.random.rand(1000, 1000)
a[a < 0.1] = np.nan    # set some random values to nan
b = np.ones_like(a)

One option is to use np.where to set the value of the result to 0 wherever one of your arrays is equal to NaN:

result = np.where(np.isnan(a), 0, a * b)

If you have to do several operations on an array that contains NaNs, you might consider using masked arrays, which provide a more general method for dealing with missing or invalid values:

masked_a = np.ma.masked_invalid(a)

result2 = masked_a * b

Here, result2 is another np.ma.masked_array whose .mask attribute is set according to where the NaN values were in a. To convert this back to a normal np.ndarray with the masked values replaced by 0s, you can use the .filled() method, passing in the fill value of your choice:

result_filled = result2.filled(0)

assert np.all(result_filled == result)
like image 124
ali_m Avatar answered Sep 22 '25 08:09

ali_m