I'm just getting started with Pandas and am trying to combine: Grouping my data by date, and counting the unique values in each group.
Here's what my data looks like:
User, Type
Datetime
2014-04-15 11:00:00, A, New
2014-04-15 12:00:00, B, Returning
2014-04-15 13:00:00, C, New
2014-04-20 14:00:00, D, New
2014-04-20 15:00:00, B, Returning
2014-04-20 16:00:00, B, Returning
2014-04-20 17:00:00, D, Returning
And here's what I would like to get to: Resample the datetime index to the day (which I can do), and also count the unique users for each day. I'm not interested in the 'Type' column yet.
Day, Unique Users
2014-04-15, 3
2014-04-20, 2
I'm trying df.user.resample('D', how='count').unique
but it doesn't seem to give me the right answer.
You don't need to do a resample to get the desired output in your question. I think you can get by with just a groupby
on date:
print df.groupby(df.index.date)['User'].nunique()
2014-04-15 3
2014-04-20 2
dtype: int64
And then if you want to you could resample to fill in the time series gaps after you count the unique users:
cnt = df.groupby(df.index.date)['User'].nunique()
cnt.index = cnt.index.to_datetime()
print cnt.resample('D')
2014-04-15 3
2014-04-16 NaN
2014-04-17 NaN
2014-04-18 NaN
2014-04-19 NaN
2014-04-20 2
Freq: D, dtype: float64
I came across the same problem. Resample worked for me with nunique. The nice way with resample is that it makes it very simple to change the sample rate for example to hour or minutes and that the timestamp is kept as index.
df.user.resample('D').nunique()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With