Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum by groups, preserving NAs

Tags:

python

pandas

I am trying to aggregate the dataframe in order to have one date per row (for each group).

     Cod1       Cod2    Date     E   A       S
327 100013.0    001 2019-02-01  0.0 0.0    511.0
323 100013.0    001 2019-02-01  0.0 -14.0   NaN
336 100013.0    001 2019-02-02  0.0 -28.0   NaN
341 100013.0    001 2019-02-03  0.0 -6.0    NaN
350 100013.0    001 2019-02-03  0.0 -3.0    NaN
373 100013.0    001 2019-02-07  0.0 -15.0   0
377 100013.0    001 2019-02-07  0.0 -9.0    NaN

Using the following:

df = df.groupby(['Date', 'Cod1', 'Cod2'])['E','A', 'S'].sum()

I got the following output:

2019-02-01  100013.0    001 0.0 -14.0   511.0
2019-02-02  100013.0    001 0.0 -28.0   0.0
2019-02-03  100013.0    001 0.0 -9.0    0.0
2019-02-06  100013.0    001 0.0 -24.0   0.0

My questions is:

  • Is there some way to aggregate preserving NaN ?

There will be 3 scenarios:

1 -) Two rows on same date, last column having NaN and a not null number:

327 100013.0    001 2019-02-01  0.0 0.0    511.0
323 100013.0    001 2019-02-01  0.0 -14.0   NaN

I would like that in this situation always keep the number.

2-) Two rows on same date, last column having 2 NaNs rows

341 100013.0    001 2019-02-03  0.0 -6.0    NaN
350 100013.0    001 2019-02-03  0.0 -3.0    NaN

I would like that in this situation always keep the NaN.

3-) Two rows on same date, last column having one zero value column and one NaN column

373 100013.0    001 2019-02-07  0.0 -15.0   0
377 100013.0    001 2019-02-07  0.0 -9.0    NaN

I would like that in this situation always keep the 0.


So my expected out should be this one:

2019-02-01  100013.0    001 0.0 -14.0   511.0
2019-02-02  100013.0    001 0.0 -28.0   NaN
2019-02-03  100013.0    001 0.0 -9.0    NaN
2019-02-06  100013.0    001 0.0 -24.0   0.0
like image 247
Cesar Avatar asked Jan 26 '23 18:01

Cesar


2 Answers

Check min_count

df.groupby(['Date', 'Cod1', 'Cod2'])['E','A', 'S'].sum(min_count=1)
Out[260]: 
                            E     A      S
Date       Cod1     Cod2                  
2019-02-01 100013.0 1     0.0 -14.0  511.0
2019-02-02 100013.0 1     0.0 -28.0    NaN
2019-02-03 100013.0 1     0.0  -9.0    NaN
2019-02-07 100013.0 1     0.0 -24.0    0.0
like image 113
BENY Avatar answered Feb 16 '23 13:02

BENY


I guess a custom function can do:

(df.groupby(['Date', 'Cod1', 'Cod2'])
    ['E','A', 'S']
    .agg(lambda x: np.nan if x.isna().all() else x.sum())
)

Output:

                            E     A      S
Date       Cod1     Cod2                  
2019-02-01 100013.0 1     0.0 -14.0  511.0
2019-02-02 100013.0 1     0.0 -28.0    NaN
2019-02-03 100013.0 1     0.0  -9.0    NaN
2019-02-07 100013.0 1     0.0 -24.0    0.0
like image 44
Quang Hoang Avatar answered Feb 16 '23 13:02

Quang Hoang