Group by and find sum for groups but return NaN as NaN, not 0

Tags:

I have a dataframe where each unique group has 4 rows. So I need to group by columns that makes them unique and does some aggregations such as max, min, sum and average. But the problem is that I have for some group all NaN values (in some column) and returns me a 0. Is it possible to return me a NaN? For example: df

       time            id     el    conn   column1  column2  column3
2018-02-11 14:00:00     1     a      12      8        5         NaN
2018-02-11 14:00:00     1     a      12      1        NaN       NaN
2018-02-11 14:00:00     1     a      12      3        7         NaN
2018-02-11 14:00:00     1     a      12      4        12        NaN
2018-02-11 14:00:00     2     a      5       NaN      5         5
2018-02-11 14:00:00     2     a      5       NaN      3         2
2018-02-11 14:00:00     2     a      5       NaN      NaN       6
2018-02-11 14:00:00     2     a      5       NaN      7         NaN

So, for example, I need to groupby ('id', 'el', 'conn') and find sum for column1, column3 and column2. (In real case I have a lot more columns need to be performed aggregation on). I have tried a few ways: .sum(), .transform('sum'), but returns me a zero for group with all NaN values.

Desired output:

    time               id    el     conn   column1  column2  column3
2018-02-11 14:00:00     1     a      12      16       24       NaN
2018-02-11 14:00:00     2     a      5       NaN      15        13

Any help is welcomed.

632

asked Mar 12 '18 11:03

jovicbg

1 Answers

Change parameter min_count to 1 - this working in last pandas version 0.22.0:

min_count : int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

New in version 0.22.0: Added with the default being 1. This means the sum or product of an all-NA or empty series is NaN.

df = df.groupby(['time','id', 'el', 'conn'], as_index=False).sum(min_count=1)
print (df)
                  time  id el  conn  column1  column2  column3
0  2018-02-11 14:00:00   1  a    12     16.0     24.0      NaN
1  2018-02-11 14:00:00   2  a     5      NaN     15.0     13.0

answered Sep 28 '22 06:09

jezrael

Related questions
                            
                                import_meta_graph fails with Data loss: not an sstable (bad magic number)
                            
                                unexpected behaviour of dictionary membership check
                            
                                Pandas :How to split the tuple data in column and create multiple columns
                            
                                Using tensorflow's Dataset pipeline, how do I *name* the results of a `map` operation?
                            
                                Python - AttributeError: 'NoneType' object has no attribute 'cursor'
                            
                                In h5py, do I need to call flush() before I close a file?
                            
                                Simple linear regression using pandas dataframe
                            
                                Converting a list of tuples to an array or other structure that allows easy slicing
                            
                                Pandas dataframe: omit weekends and days near holidays
                            
                                Dataset API does not pass dimensionality information for its output tensor when using py_func
                            
                                Using shift and rolling in pandas with groupBy
                            
                                How to convert .mp3 files to arrays of frequencies and amplitudes using python?
                            
                                Subtracting datetime value from previous row in pandas dataframe
                            
                                interface between google colaboratory and google cloud
                            
                                Visual studio: Python virtual environments in source control
                            
                                How to create a function for recursively generating iterating functions
                            
                                How to print the gradients during training in Tensorflow?
                            
                                Error: "Driver not loaded" in PyQt5
                            
                                Generate N-Grams from strings with pandas
                            
                                Pandas convert Column to time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group by and find sum for groups but return NaN as NaN, not 0

Tags:

python

pandas

dataframe

nan

numpy

jovicbg

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us