Groupby Aggregate method is returning NaN always

Tags:

pandas

Hi I am running into this issue where my datasource events looks like this:

   event_id             device_id            timestamp  longitude  latitude
0         1     29182687948017175  2016-05-01 00:55:25     121.38     31.24
1         2  -6401643145415154744  2016-05-01 00:54:12     103.65     30.97
2         3  -4833982096941402721  2016-05-01 00:08:05     106.60     29.7

I am trying to group the events by the device_id and then get the sum/mean/std of the variable over every event with that device_id:

events['latitude_mean'] = events.groupby(['device_id'])['latitude'].aggregate(np.sum)

But my Output is always:

event_id             device_id            timestamp  longitude  latitude
0         1     29182687948017175  2016-05-01 00:55:25     121.38     31.24   
1         2  -6401643145415154744  2016-05-01 00:54:12     103.65     30.97   
2         3  -4833982096941402721  2016-05-01 00:08:05     106.60     29.70   
3         4  -6815121365017318426  2016-05-01 00:06:40     104.27     23.28   
4         5  -5373797595892518570  2016-05-01 00:07:18     115.88     28.66   

   latitude_mean  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4            NaN

What am I doing wrong to keep getting the return value to be NaN for each row?

278

asked Jul 23 '16 07:07

1 Answers

you can use pandas.core.groupby.GroupBy.transform(aggfunc) method, which applies aggfunc to all rows in each group:

In [32]: events['latitude_mean'] = events.groupby(['device_id'])['latitude'].transform('sum')

In [33]: events
Out[33]:
   event_id            device_id            timestamp  longitude  latitude  latitude_mean
0         1    29182687948017175  2016-05-01 00:55:25     121.38     31.24          62.55
1         2    29182687948017175  2016-05-30 12:12:12     777.77     31.31          62.55
2         3 -6401643145415154744  2016-05-01 00:54:12     103.65     30.97          64.30
3         4 -6401643145415154744  2016-01-01 11:11:11     111.11     33.33          64.30

Here you may find some usage examples

Explanation: when you group your DF - as a result you usually have a series containing less rows and with different index, so pandas doesn't know how to align it when assigning it to a new column and as a result you have NaN's:

In [31]: events.groupby(['device_id'])['latitude'].agg(np.sum)
Out[31]:
device_id
-6401643145415154744    64.30
 29182687948017175      62.55
Name: latitude, dtype: float64

so when you try to assign it to a new column, pandas does something like this:

In [36]: events['nans'] = pd.Series([1,2], index=['a','b'])

In [38]: events[['event_id','nans']]
Out[38]:
   event_id  nans
0         1   NaN
1         2   NaN
2         3   NaN
3         4   NaN

Data:

In [30]: events
Out[30]:
   event_id            device_id            timestamp  longitude  latitude
0         1    29182687948017175  2016-05-01 00:55:25     121.38     31.24
1         2    29182687948017175  2016-05-30 12:12:12     777.77     31.31
2         3 -6401643145415154744  2016-05-01 00:54:12     103.65     30.97
3         4 -6401643145415154744  2016-01-01 11:11:11     111.11     33.33

answered Oct 14 '22 19:10

MaxU - stop WAR against UA

Related questions
                            
                                Make Python unittest show AssertionError but no Traceback
                            
                                Adding scikit-learn (sklearn) prediction to pandas data frame
                            
                                Pyplot Scatter to Contour plot
                            
                                Detecting a keypress in python while in the background
                            
                                Python Instantiate All Classes Within a Module [duplicate]
                            
                                Formatting columns containing non-ascii characters
                            
                                Celery: how to limit number of tasks in queue and stop feeding when full?
                            
                                Pandas Passing Variable Names into Column Name
                            
                                Split one level of an multi index into columns
                            
                                Why there are pip and conda packages after fresh installation?
                            
                                Carriage Return not working in IDLE?
                            
                                Get the commands distutils passes to the compiler
                            
                                Django ArrayField null=True migration with Postgresql
                            
                                pupil detection in OpenCV & Python
                            
                                Is there any way to use aiohttp client with socks proxy?
                            
                                Access Python QObject from QML fails to convert on second call
                            
                                Lazy iterators (generators) with asyncio
                            
                                Is there a proper way to set compound greek letters as a symbol in SymPy?
                            
                                How do I use a keyword as a variable name?
                            
                                Import Error: No module named numpy Anaconda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Groupby Aggregate method is returning NaN always

Tags:

python

pandas

Bryan Dickens

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us