Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas GroupBy and add count of unique values as a new column

Problem

I'm trying to group a Pandas DataFrame by columns day and source, and then count unique instances of a 3rd (timestamp) for each group (to find out how many hits each source got per day), but I get this error (full traceback below): TypeError: 'method' object is not subscriptable

Attempts

My code is:

df['timestamp_count'] = df.groupby(["source", "day"]).agg({"timestamp": "nunique"})

And ideally I'd like to get a result like this (including the new column, timestamp_count:

day    source    timestamp                timestamp_count
1      facebook  2018-08-04 11:16:32.416  2
1      facebook  2019-01-03 10:25:38.216  2
1      twitter   2018-10-14 13:26:22.123  1
2      facebook  2019-01-30 12:16:32.416  1

When I run df.describe(), I see that source is an object but day is an integer. Could this be part of the problem? Having looked at other questions, it seems probable that it is along those lines, yet none of them seem to throw the same input/error combination.

Can anyone help a Python newbie? Many thanks!

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-90-db5a002137ca> in <module>
----> 1 df['timestamp_count'] = (df.groupby["source", 
"day"]).agg({"timestamp": "nunique"})

TypeError: 'method' object is not subscriptable
like image 942
FPL Avatar asked Jan 26 '23 18:01

FPL


1 Answers

Use transform to broadcast the result:

df['timestamp_count'] = (
    df.groupby(["source", "day"])['timestamp'].transform('nunique'))
df

   day    source                timestamp  timestamp_count
0    1  facebook  2018-08-04 11:16:32.416                2
1    1  facebook  2019-01-03 10:25:38.216                2
2    1   twitter  2018-10-14 13:26:22.123                1
3    2  facebook  2019-01-30 12:16:32.416                1
like image 180
cs95 Avatar answered Feb 12 '23 14:02

cs95