Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy cumsum considering NaNs

I am looking for a succinct way to go from:

 a = numpy.array([1,4,1,numpy.nan,2,numpy.nan])

to:

  b = numpy.array([1,5,6,numpy.nan,8,numpy.nan])

The best I can do currently is:

b = numpy.insert(numpy.cumsum(a[numpy.isfinite(a)]), (numpy.argwhere(numpy.isnan(a)) - numpy.arange(len(numpy.argwhere(numpy.isnan(a))))), numpy.nan)

Is there a shorter way to accomplish the same? What about doing a cumsum along an axis of a 2D array?

like image 893
Benjamin Avatar asked Oct 24 '12 14:10

Benjamin


2 Answers

Pandas is a library build on top of numpy. It's Series class has a cumsum method, which preserves the nan's and is considerably faster than the solution proposed by DSM:

In [15]: a = arange(10000.0)

In [16]: a[1] = np.nan

In [17]: %timeit a*0 + np.nan_to_num(a).cumsum()
1000 loops, best of 3: 465 us per loop

In [18] s = pd.Series(a)

In [19]: s.cumsum()
Out[19]: 
0       0
1     NaN
2       2
3       5
...
9996    49965005
9997    49975002
9998    49985000
9999    49994999
Length: 10000

In [20]: %timeit s.cumsum()
10000 loops, best of 3: 175 us per loop
like image 57
bmu Avatar answered Oct 22 '22 12:10

bmu


How about (for not-too-big arrays):

In [34]: import numpy as np

In [35]: a = np.array([1,4,1,np.nan,2,np.nan])

In [36]: a*0 + np.nan_to_num(a).cumsum()
Out[36]: array([  1.,   5.,   6.,  nan,   8.,  nan])
like image 7
DSM Avatar answered Oct 22 '22 12:10

DSM