I have a df that contains daily product and volume data:
date product volume
20160101 A 10
20160101 B 5
...
20160102 A 20
...
...
20160328 B 20
20160328 C 100
...
20160330 D 20
I've grouped it up by month via
df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()
which gives me a Series of the form:
yearmonth product
201601 A 100
B 90
C 90
D 85
E 180
F 50
...
201602 A 200
C 120
F 220
G 40
I 50
...
201603 B 120
C 110
D 110
...
I want to return the top n volume values per product per month. For example the top 3 values would return:
201601 A 100
B 90
C 90
E 180
201602 A 200
C 120
F 220
201603 B 120
C 110
D 110
I can find some answers using pd.IndexSlice
and select
but they seem to act on the index alone. I can't figure out how to sort the individual group's values
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
You can use SeriesGroupBy.nlargest
:
print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth product
201601 E 180
A 100
B 90
201602 F 220
A 200
C 120
201603 B 120
C 110
D 110
Name: val, dtype: int64
Also you can use to_datetime
with to_period
for convert to year-month
period:
print (df)
date product Volume
0 20160101 A 10
1 20160101 B 5
2 20160101 C 10
3 20160101 D 5
4 20160102 A 20
5 20160102 A 10
6 20160102 B 5
7 20160102 C 10
8 20160102 D 5
9 20160328 A 20
10 20160328 C 100
11 20160328 B 20
12 20160328 D 20
13 20160330 D 20
grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
'product'])['Volume'].sum()
print (grouped)
date product
2016-01 A 40
B 10
C 20
D 10
2016-03 A 20
B 20
C 100
D 40
Name: Volume, dtype: int64
print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date product
2016-01 A 40
C 20
B 10
2016-03 C 100
D 40
A 20
Name: Volume, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With