I have dataframe that I am trying to group by which looks like this
Cust_ID Store_ID month lst_buy_dt1 purchase_amt
1 20 10 2015-10-07 100
1 20 10 2015-10-09 200
1 20 10 2015-10-20 100
I need the maximum of ls_buy_dt
and maximum or purchase amount for each cust_ID
, Store_ID
combination for each month in a different dataframe. Sample ouput:
Cust_ID Stored_ID month max_lst_buy_dt tot_purchase_amt
1 20 10 2015-10-20 400
My code is below .
aggregations = {
'lst_buy_dt1': { # Get the max purchase date across all purchases in a month
'max_lst_buy_dt': 'max',
},
'purchase_amt': { # Sum the purchases
'tot_purchase': 'sum', # Find the max, call the result "max_date"
}
}
grouped_at_Cust=metro_sales.groupby(['cust_id','store_id','month']).agg(aggregations).reset_index()
I am able to get the right aggregations . However the data frame contains an additional index in columns which I am not able to get rid of. Unable to show it, but here is the result from
list(grouped_at_Cust.columns.values)
[('cust_id', ''),
('store_id', ''),
('month', ''),
('lst_buy_dt1', 'max_lst_buy_dt'),
('purchase_amt', 'tot_purchase')]
Notice the hierarchy in the last 2 columns. How to get rid of it? I just need the columns max_lst_buy_dt
and tot_purchase
.
To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().
Reset index without new column By default, DataFrame. reset_index() adds the current row index as a new 'index' column in DataFrame. If we do not want to add the new column, we can use the drop parameter. If drop=True then it does not add the new column of the current row index in the DataFrame.
reset_index in pandas is used to reset index of the dataframe object to default indexing (0 to number of rows minus 1) or to reset multi level index. By doing so, the original index gets converted to a column.
Edit: based on your comment, you can simply drop the first level of the columns index. For instance with a more complicated aggregation:
aggregations = {
'lst_buy_dt1': {
'max_lst_buy_dt': 'max',
'min_lst_buy_dt': 'min',
},
'purchase_amt': {
'tot_purchase': 'sum',
}
}
grouped_at_Cust = metro_sales.groupby(['cust_id', 'store_id', 'month']).agg(aggregations).reset_index()
grouped_at_Cust.columns = grouped_at_Cust.columns.droplevel(0)
Output:
tot_purchase min_lst_buy_dt max_lst_buy_dt
0 cust_id 100 2015-10-07 2015-10-07
1 month 100 2015-10-20 2015-10-20
2 store_id 200 2015-10-09 2015-10-09
Original answer
I think your aggregations
dictionary is too complicated. If you follow the documentation:
agg = {
'lst_buy_dt1': 'max',
'purchase_amt': 'sum',
}
metro_sales.groupby(['cust_id','store_id','month']).agg(agg).reset_index()
Out[19]:
index purchase_amt lst_buy_dt1
0 cust_id 100 2015-10-07
1 month 100 2015-10-20
2 store_id 200 2015-10-09
All you need now is to rename the columns of the result:
grouped_at_Cust.rename(columns={
'lst_buy_dt1': 'max_lst_buy_dt',
'purchase_amt': 'tot_purchase'
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With