I have a df that contains daily product and volume data:
date        product     volume
20160101    A           10
20160101    B           5
...
20160102    A           20
...
...
20160328    B           20
20160328    C           100
...
20160330    D           20
I've grouped it up by month via
df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()
which gives me a Series of the form:
yearmonth   product 
201601      A       100
            B       90
            C       90
            D       85
            E       180
            F       50
            ...
201602      A       200
            C       120
            F       220
            G       40
            I       50
            ...
201603      B       120
            C       110
            D       110
            ...
I want to return the top n volume values per product per month. For example the top 3 values would return:
201601  A  100
        B   90
        C   90
        E   180
201602  A   200
        C   120
        F   220
201603  B   120
        C   110
        D   110
I can find some answers using pd.IndexSlice and select but they seem to act on the index alone.  I can't figure out how to sort the individual group's values        
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
You can use SeriesGroupBy.nlargest:
print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth  product
201601     E          180
           A          100
           B           90
201602     F          220
           A          200
           C          120
201603     B          120
           C          110
           D          110
Name: val, dtype: int64
Also you can use to_datetime with to_period for convert to year-month period:
print (df)
        date product  Volume
0   20160101       A      10
1   20160101       B       5
2   20160101       C      10
3   20160101       D       5
4   20160102       A      20
5   20160102       A      10
6   20160102       B       5
7   20160102       C      10
8   20160102       D       5
9   20160328       A      20
10  20160328       C     100
11  20160328       B      20
12  20160328       D      20
13  20160330       D      20
grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
                     'product'])['Volume'].sum()
print (grouped)
date     product
2016-01  A           40
         B           10
         C           20
         D           10
2016-03  A           20
         B           20
         C          100
         D           40
Name: Volume, dtype: int64
print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date     product
2016-01  A           40
         C           20
         B           10
2016-03  C          100
         D           40
         A           20
Name: Volume, dtype: int64
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With