I have following dataframe:
                       uniq_id    value
2016-12-26 11:03:10        001      342
2016-12-26 11:03:13        004        5
2016-12-26 12:03:13        005       14
2016-12-26 12:03:13        008      114
2016-12-27 11:03:10        009      343
2016-12-27 11:03:13        013        5
2016-12-27 12:03:13        016      124
2016-12-27 12:03:13        018      114
And i need get top N records for each day sorted by value. Something like this (for N=2):
2016-12-26   001   342
             008   114
2016-12-27   009   343
             016   124
Please suggest right way to do that in pandas 0.19.x
You can find out how to perform groupby and apply sort within groups of Pandas DataFrame by using DataFrame.Sort_values () and DataFrame.groupby () and apply () with lambda functions. In this article, I will explain how groupby and apply sort within groups of pandas DataFrame. 1. Quick Examples of Sort within Groups of Pandas DataFrame
Pandas Groupby – Sort within groups Last Updated : 29 Aug, 2020 Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.
Here’s the near-equivalent in Pandas: You call .groupby () and pass the name of the column you want to group on, which is "state". Then, you use ["last_name to specify the columns on which you want to perform the actual aggregation.
One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in DataFrameGroupBy.__iter__ () and produces an iterator of (group, DataFrame) pairs for DataFrames:
Unfortunately there is no yet such method as DataFrameGroupBy.nlargest(), which would allow us to do the following:
df.groupby(...).nlargest(2, columns=['value'])
So here is a bit ugly, but working solution:
In [73]: df.set_index(df.index.normalize()).reset_index().sort_values(['index','value'], ascending=[1,0]).groupby('index').head(2)
Out[73]:
       index  uniq_id  value
0 2016-12-26        1    342
3 2016-12-26        8    114
4 2016-12-27        9    343
6 2016-12-27       16    124
PS i think there must be a better one...
UPDATE: if your DF wouldn't have duplicated index values, the following solution should work as well:
In [117]: df
Out[117]:
                     uniq_id  value
2016-12-26 11:03:10        1    342
2016-12-26 11:03:13        4      5
2016-12-26 12:03:13        5     14
2016-12-26 12:33:13        8    114    # <-- i've intentionally changed this index value
2016-12-27 11:03:10        9    343
2016-12-27 11:03:13       13      5
2016-12-27 12:03:13       16    124
2016-12-27 12:33:13       18    114    # <-- i've intentionally changed this index value
In [118]: df.groupby(pd.TimeGrouper('D')).apply(lambda x: x.nlargest(2, 'value')).reset_index(level=1, drop=1)
Out[118]:
            uniq_id  value
2016-12-26        1    342
2016-12-26        8    114
2016-12-27        9    343
2016-12-27       16    124
                        df.set_index('uniq_id', append=True) \
    .groupby(df.index.date).value.nlargest(2) \
    .rename_axis([None, None, 'uniq_id']).reset_index(-1)
                                uniq_id  value
2016-12-26 2016-12-26 11:03:10        1    342
           2016-12-26 12:03:13        8    114
2016-12-27 2016-12-27 11:03:10        9    343
           2016-12-27 12:03:13       16    124
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With