Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby Aggregate method is returning NaN always

Tags:

python

pandas

Hi I am running into this issue where my datasource events looks like this:

   event_id             device_id            timestamp  longitude  latitude
0         1     29182687948017175  2016-05-01 00:55:25     121.38     31.24
1         2  -6401643145415154744  2016-05-01 00:54:12     103.65     30.97
2         3  -4833982096941402721  2016-05-01 00:08:05     106.60     29.7

I am trying to group the events by the device_id and then get the sum/mean/std of the variable over every event with that device_id:

events['latitude_mean'] = events.groupby(['device_id'])['latitude'].aggregate(np.sum)

But my Output is always:

event_id             device_id            timestamp  longitude  latitude
0         1     29182687948017175  2016-05-01 00:55:25     121.38     31.24   
1         2  -6401643145415154744  2016-05-01 00:54:12     103.65     30.97   
2         3  -4833982096941402721  2016-05-01 00:08:05     106.60     29.70   
3         4  -6815121365017318426  2016-05-01 00:06:40     104.27     23.28   
4         5  -5373797595892518570  2016-05-01 00:07:18     115.88     28.66   

   latitude_mean  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4            NaN

What am I doing wrong to keep getting the return value to be NaN for each row?

like image 278
Bryan Dickens Avatar asked Jul 23 '16 07:07

Bryan Dickens


People also ask

Does GROUP BY include NaN values?

NaN values mean "Not-a-Number" which generally means that there are some missing values in the cell. Here, we are going to learn how to groupby column values with NaN values, as the groupby method usually excludes the NaN values hence to include NaN values, we use groupby method with some special parameters.

What does GROUP BY function return?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

What does GROUP BY sum return?

groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

What is AGG in GROUP BY?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.


1 Answers

you can use pandas.core.groupby.GroupBy.transform(aggfunc) method, which applies aggfunc to all rows in each group:

In [32]: events['latitude_mean'] = events.groupby(['device_id'])['latitude'].transform('sum')

In [33]: events
Out[33]:
   event_id            device_id            timestamp  longitude  latitude  latitude_mean
0         1    29182687948017175  2016-05-01 00:55:25     121.38     31.24          62.55
1         2    29182687948017175  2016-05-30 12:12:12     777.77     31.31          62.55
2         3 -6401643145415154744  2016-05-01 00:54:12     103.65     30.97          64.30
3         4 -6401643145415154744  2016-01-01 11:11:11     111.11     33.33          64.30

Here you may find some usage examples

Explanation: when you group your DF - as a result you usually have a series containing less rows and with different index, so pandas doesn't know how to align it when assigning it to a new column and as a result you have NaN's:

In [31]: events.groupby(['device_id'])['latitude'].agg(np.sum)
Out[31]:
device_id
-6401643145415154744    64.30
 29182687948017175      62.55
Name: latitude, dtype: float64

so when you try to assign it to a new column, pandas does something like this:

In [36]: events['nans'] = pd.Series([1,2], index=['a','b'])

In [38]: events[['event_id','nans']]
Out[38]:
   event_id  nans
0         1   NaN
1         2   NaN
2         3   NaN
3         4   NaN

Data:

In [30]: events
Out[30]:
   event_id            device_id            timestamp  longitude  latitude
0         1    29182687948017175  2016-05-01 00:55:25     121.38     31.24
1         2    29182687948017175  2016-05-30 12:12:12     777.77     31.31
2         3 -6401643145415154744  2016-05-01 00:54:12     103.65     30.97
3         4 -6401643145415154744  2016-01-01 11:11:11     111.11     33.33
like image 94
MaxU - stop WAR against UA Avatar answered Oct 14 '22 19:10

MaxU - stop WAR against UA