Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Count Unique Values after Resample

Tags:

python

pandas

I'm just getting started with Pandas and am trying to combine: Grouping my data by date, and counting the unique values in each group.

Here's what my data looks like:

                  User, Type
Datetime
2014-04-15 11:00:00, A, New
2014-04-15 12:00:00, B, Returning
2014-04-15 13:00:00, C, New
2014-04-20 14:00:00, D, New
2014-04-20 15:00:00, B, Returning
2014-04-20 16:00:00, B, Returning
2014-04-20 17:00:00, D, Returning

And here's what I would like to get to: Resample the datetime index to the day (which I can do), and also count the unique users for each day. I'm not interested in the 'Type' column yet.

Day, Unique Users
2014-04-15, 3
2014-04-20, 2

I'm trying df.user.resample('D', how='count').unique but it doesn't seem to give me the right answer.

like image 981
Malcolm Bastien Avatar asked Apr 22 '14 00:04

Malcolm Bastien


2 Answers

You don't need to do a resample to get the desired output in your question. I think you can get by with just a groupby on date:

print df.groupby(df.index.date)['User'].nunique()

2014-04-15    3
2014-04-20    2
dtype: int64

And then if you want to you could resample to fill in the time series gaps after you count the unique users:

cnt = df.groupby(df.index.date)['User'].nunique()
cnt.index = cnt.index.to_datetime()
print cnt.resample('D')

2014-04-15     3
2014-04-16   NaN
2014-04-17   NaN
2014-04-18   NaN
2014-04-19   NaN
2014-04-20     2
Freq: D, dtype: float64
like image 149
Karl D. Avatar answered Oct 31 '22 19:10

Karl D.


I came across the same problem. Resample worked for me with nunique. The nice way with resample is that it makes it very simple to change the sample rate for example to hour or minutes and that the timestamp is kept as index.

df.user.resample('D').nunique()
like image 2
mjspier Avatar answered Oct 31 '22 19:10

mjspier