Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reset indexes when aggregating multiple columns in pandas

I have dataframe that I am trying to group by which looks like this

Cust_ID Store_ID month lst_buy_dt1  purchase_amt    
 1       20       10     2015-10-07  100
 1       20       10     2015-10-09  200
 1       20       10     2015-10-20  100

I need the maximum of ls_buy_dt and maximum or purchase amount for each cust_ID, Store_ID combination for each month in a different dataframe. Sample ouput:

Cust_ID Stored_ID month max_lst_buy_dt tot_purchase_amt
 1       20        10      2015-10-20     400

My code is below .

aggregations = {
    'lst_buy_dt1': { # Get the max purchase date across all purchases in a month
    'max_lst_buy_dt': 'max',       
    },
    'purchase_amt': {     # Sum the purchases 
    'tot_purchase': 'sum',   # Find the max, call the result "max_date"
    }
}

grouped_at_Cust=metro_sales.groupby(['cust_id','store_id','month']).agg(aggregations).reset_index()

I am able to get the right aggregations . However the data frame contains an additional index in columns which I am not able to get rid of. Unable to show it, but here is the result from

list(grouped_at_Cust.columns.values)

[('cust_id', ''),
('store_id', ''),
('month', ''),
('lst_buy_dt1', 'max_lst_buy_dt'),
('purchase_amt', 'tot_purchase')]

Notice the hierarchy in the last 2 columns. How to get rid of it? I just need the columns max_lst_buy_dt and tot_purchase.

like image 785
sourav Avatar asked Sep 19 '16 08:09

sourav


People also ask

How do you reindex after Groupby Pandas?

To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().

How do I reset index without creating new column?

Reset index without new column By default, DataFrame. reset_index() adds the current row index as a new 'index' column in DataFrame. If we do not want to add the new column, we can use the drop parameter. If drop=True then it does not add the new column of the current row index in the DataFrame.

What does Reset index do in Pandas?

reset_index in pandas is used to reset index of the dataframe object to default indexing (0 to number of rows minus 1) or to reset multi level index. By doing so, the original index gets converted to a column.


1 Answers

Edit: based on your comment, you can simply drop the first level of the columns index. For instance with a more complicated aggregation:

aggregations = {
    'lst_buy_dt1': {
        'max_lst_buy_dt': 'max',       
        'min_lst_buy_dt': 'min',       
    },
    'purchase_amt': {
        'tot_purchase': 'sum',
    }
}
grouped_at_Cust = metro_sales.groupby(['cust_id', 'store_id', 'month']).agg(aggregations).reset_index()
grouped_at_Cust.columns = grouped_at_Cust.columns.droplevel(0)

Output:

             tot_purchase min_lst_buy_dt max_lst_buy_dt
0   cust_id           100     2015-10-07     2015-10-07
1     month           100     2015-10-20     2015-10-20
2  store_id           200     2015-10-09     2015-10-09

Original answer

I think your aggregations dictionary is too complicated. If you follow the documentation:

agg = {
    'lst_buy_dt1': 'max',       
    'purchase_amt': 'sum',
}
metro_sales.groupby(['cust_id','store_id','month']).agg(agg).reset_index()
Out[19]: 
      index  purchase_amt lst_buy_dt1
0   cust_id           100  2015-10-07
1     month           100  2015-10-20
2  store_id           200  2015-10-09

All you need now is to rename the columns of the result:

grouped_at_Cust.rename(columns={
    'lst_buy_dt1': 'max_lst_buy_dt', 
    'purchase_amt': 'tot_purchase'
})
like image 68
IanS Avatar answered Dec 08 '22 01:12

IanS