I have a table that looks like this called rev_df.
pcid date rep rev new_rev diff Period
0 523468 2017-01-01 1127 16.60 0 NaN 1
1 523468 2017-01-02 1127 41.32 0 1 1
2 523468 2017-01-03 4568 52.39 0 1 1
3 523468 2017-01-04 4568 47.31 0 1 2
This is the line of code in question that's causing some PROBLEMS™.
rev_df_period = rev_df.groupby(['pcid', 'Period']).agg({'date': [np.min,np.max],
'rev':np.sum,
'new_prod_rev':np.sum,
'historical_sales_rep': lambda x: x.unique()
})
The lambda x: x.unique() is causing the following error:
ValueError: Function does not reduce
Through testing, I found that if I change the last agg lambda function to .nunique(), it doesn't throw an error. But I need the list of unique rep values, not the number of values.
Any ideas?
The output should look like this:
new_rev date rev rep
sum amin amax sum unique
pcid Period
523468 1 0 2017-01-01 2017-02-01 1026.94 [1127,4568]
2 0 2017-03-24 2017-03-30 90.00 4568
You can try this:
df.groupby(['pcid', 'Period']).agg({'date': [np.min,np.max],
'rev':np.sum,
'new_rev':np.sum,
'rep': lambda x: list(set(x))
})
Output:
date rev new_rev rep
amin amax sum sum <lambda>
pcid Period
523468 1 2017-01-01 2017-01-03 110.31 0 [4568, 1127]
2 2017-01-04 2017-01-04 47.31 0 [4568]
Edit to get proper column naming
f = lambda x: list(set(x))
f.__name__ = 'unique'
rev_df.groupby(['pcid', 'Period']).agg({'date': [np.min,np.max],
'rev':np.sum,
'new_rev':np.sum,
'rep': f
})
Output:
date rev new_rev rep
amin amax sum sum unique
pcid Period
523468 1 2017-01-01 2017-01-03 110.31 0 [4568, 1127]
2 2017-01-04 2017-01-04 47.31 0 [4568]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With