I have a series that looks like this:
delivery
2007-04-26 706 23
2007-04-27 705 10
706 1089
708 83
710 13
712 51
802 4
806 1
812 3
2007-04-29 706 39
708 4
712 1
2007-04-30 705 3
706 1016
707 2
...
2014-11-04 1412 53
1501 1
1502 1
1512 1
2014-11-05 1411 47
1412 1334
1501 40
1502 433
1504 126
1506 100
1508 7
1510 6
1512 51
1604 1
1612 5
Length: 26255, dtype: int64
where the query is: df.groupby([df.index.date, 'delivery']).size()
For each day, I need to pull out the delivery number which has the most volume. I feel like it would be something like:
df.groupby([df.index.date, 'delivery']).size().idxmax(axis=1)
However, this just returns me the idxmax for the entire dataframe; instead, I need the second-level idmax (not the date but rather the delivery number) for each day, not the entire dataframe (ie. it returns a vector).
Any ideas on how to accomplish this?
Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)
I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.
Setting up data :
import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
'2007-04-27', '2007-04-28', '2007-04-28'],
'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}
df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df
output
DeliveryCount DeliveryNb
Date
2007-04-26 23 706
2007-04-27 10 705
2007-04-27 1089 708
2007-04-27 82 450
2007-04-27 34 283
2007-04-28 100 45
2007-04-28 11 89
creating custom function :
The trick is to use the reset_index() method (so you easily get the integer index of the group)
def func(df):
idx = df.reset_index()['DeliveryCount'].idxmax()
return df['DeliveryNb'].iloc[idx]
applying it :
g = df.groupby(df.index)
g.apply(func)
result :
Date
2007-04-26 706
2007-04-27 708
2007-04-28 45
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With