I have the table below in a Pandas dataframe:
date user_id whole_cost cost1
02/10/2012 00:00:00 1 1790 12
07/10/2012 00:00:00 1 364 15
30/01/2013 00:00:00 1 280 10
02/02/2013 00:00:00 1 259 24
05/03/2013 00:00:00 1 201 39
02/10/2012 00:00:00 3 623 1
07/12/2012 00:00:00 3 90 0
30/01/2013 00:00:00 3 312 90
02/02/2013 00:00:00 5 359 45
05/03/2013 00:00:00 5 301 34
02/02/2013 00:00:00 5 359 1
05/03/2013 00:00:00 5 801 12
..
The table was extracted from a csv file using the following query :
import pandas as pd
newnames = ['date','user_id', 'whole_cost', 'cost1']
df = pd.read_csv('expenses.csv', names = newnames, index_col = 'date')
I have to analyse the profile of my users and for this purpose:
I would like to group (for each user - they are thousands) queries by month summing the query whole_cost
for the entire month e.g. if user_id=1
was has a whole cost of 1790 on 02/10/2012 with cost1 12 and on the 07/10/2012 with whole cost 364, then it should have an entry in the new table of 2154 (as the whole cost) on 31/10/2012 (end of the month end-point representing the month - all dates in the transformed table will be month ends representing the whole month to which they relate).
In 0.14 you'll be able to groupby monthly and another column at the same time:
In [11]: df
Out[11]:
user_id whole_cost cost1
2012-10-02 1 1790 12
2012-10-07 1 364 15
2013-01-30 1 280 10
2013-02-02 1 259 24
2013-03-05 1 201 39
2012-10-02 3 623 1
2012-12-07 3 90 0
2013-01-30 3 312 90
2013-02-02 5 359 45
2013-03-05 5 301 34
2013-02-02 5 359 1
2013-03-05 5 801 12
In [12]: df1 = df.sort_index() # requires sorted DatetimeIndex
In [13]: df1.groupby([pd.TimeGrouper(freq='M'), 'user_id'])['whole_cost'].sum()
Out[13]:
user_id
2012-10-31 1 2154
3 623
2012-12-31 3 90
2013-01-31 1 280
3 312
2013-02-28 1 259
5 718
2013-03-31 1 201
5 1102
Name: whole_cost, dtype: int64
until 0.14 I think you're stuck with doing two groupbys:
In [14]: g = df.groupby('user_id')['whole_cost']
In [15]: g.resample('M', how='sum').dropna()
Out[15]:
user_id
1 2012-10-31 2154
2013-01-31 280
2013-02-28 259
2013-03-31 201
3 2012-10-31 623
2012-12-31 90
2013-01-31 312
5 2013-02-28 718
2013-03-31 1102
dtype: float64
With timegrouper getting deprecated, you can replace it with Grouper to get the same results
df.groupby(['user_id', pd.Grouper(key='date', freq='M')]).agg({'whole_cost':sum})
df.groupby(['user_id', df['date'].dt.dayofweek]).agg({'whole_cost':sum})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With