I would like to add thousands of 4D arrays element wise and accounting for nans. A simple example using 1D arrays would be:
X = array([4,7,89,nan,89,65, nan])
Y = array([0,5,4, 9, 8, 100,nan])
z = X+Y
print z = array([4,12,93,9,97,165,nan])
I've written a simple for loop around this but it takes forever - not a smart solution. Another solution could be creating a larger array and use bottleneck nansum but this would take too much memory for my laptop. I need a running sum over 11000 cases.
Does anyone have a smart and fast way to do this?
Here is one possibility:
>>> x = np.array([1, 2, np.nan, 3, np.nan, 4])
... y = np.array([1, np.nan, 2, 5, np.nan, 8])
>>> x = np.ma.masked_array(np.nan_to_num(x), mask=np.isnan(x) & np.isnan(y))
>>> y = np.ma.masked_array(np.nan_to_num(y), mask=x.mask)
>>> (x+y).filled(np.nan)
array([ 2., 2., 2., 8., nan, 12.])
The real difficulty is that you seem to want nan
to be interpreted as zero unless all values at a particular position are nan
. This means that you must look at both x and y to determine which nans to replace. If you are okay with having all nan values replaced, then you can simply do np.nan_to_num(x) + np.nan_to_num(y)
.
You could do something like:
arr1 = np.array([1.0, 1.0, np.nan, 1.0, 1.0, np.nan])
arr2 = np.array([1.0, 1.0, 1.0, 1.0, 1.0, np.nan])
flags = np.isnan(arr1) & np.isnan(arr2)
copy1 = arr1.copy()
copy2 = arr2.copy()
copy1[np.isnan(copy1)] = 0.0
copy2[np.isnan(copy2)] = 0.0
out = copy1 + copy2
out[flags] = np.NaN
print out
array([ 2., 2., 1., 2., 2., NaN])
to find the locations in the arrays where both have a NaN
in that index. Then, do essentially what @mgilson suggested, as in make copies and replace the NaN
s with 0.0, add the two arrays together, and then replace the flagged indices above with np.NaN
.
import numpy as np
z=np.nansum([X,Y],axis=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With