Returning top n values for group/multiindex in Pandas

I have a df that contains daily product and volume data:

date        product     volume
20160101    A           10
20160101    B           5
...
20160102    A           20
...
...
20160328    B           20
20160328    C           100
...
20160330    D           20

I've grouped it up by month via

df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()

which gives me a Series of the form:

yearmonth   product 
201601      A       100
            B       90
            C       90
            D       85
            E       180
            F       50
            ...
201602      A       200
            C       120
            F       220
            G       40
            I       50
            ...
201603      B       120
            C       110
            D       110
            ...

I want to return the top n volume values per product per month. For example the top 3 values would return:

201601  A  100
        B   90
        C   90
        E   180
201602  A   200
        C   120
        F   220
201603  B   120
        C   110
        D   110

I can find some answers using pd.IndexSlice and select but they seem to act on the index alone. I can't figure out how to sort the individual group's values

Pandas report top-n in group and pivot (which is Wes's example in "Python for Data Analysis" too)
pandas multi index sort specific fields
pandas: slice a MultiIndex by range of secondary index

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

Does pandas Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

You can use SeriesGroupBy.nlargest:

print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth  product
201601     E          180
           A          100
           B           90
201602     F          220
           A          200
           C          120
201603     B          120
           C          110
           D          110
Name: val, dtype: int64

Also you can use to_datetime with to_period for convert to year-month period:

print (df)
        date product  Volume
0   20160101       A      10
1   20160101       B       5
2   20160101       C      10
3   20160101       D       5
4   20160102       A      20
5   20160102       A      10
6   20160102       B       5
7   20160102       C      10
8   20160102       D       5
9   20160328       A      20
10  20160328       C     100
11  20160328       B      20
12  20160328       D      20
13  20160330       D      20

grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
                     'product'])['Volume'].sum()
print (grouped)
date     product
2016-01  A           40
         B           10
         C           20
         D           10
2016-03  A           20
         B           20
         C          100
         D           40
Name: Volume, dtype: int64

print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date     product
2016-01  A           40
         C           20
         B           10
2016-03  C          100
         D           40
         A           20
Name: Volume, dtype: int64

Returning top n values for group/multiindex in Pandas

Tags:

python

sorting

pandas

comedyDave

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us

Returning top n values for group/multiindex in Pandas

Tags:

python

sorting

pandas

comedyDave

People also ask

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us