Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - sort and head inside groupby

I have following dataframe:

                       uniq_id    value
2016-12-26 11:03:10        001      342
2016-12-26 11:03:13        004        5
2016-12-26 12:03:13        005       14
2016-12-26 12:03:13        008      114
2016-12-27 11:03:10        009      343
2016-12-27 11:03:13        013        5
2016-12-27 12:03:13        016      124
2016-12-27 12:03:13        018      114

And i need get top N records for each day sorted by value. Something like this (for N=2):

2016-12-26   001   342
             008   114
2016-12-27   009   343
             016   124

Please suggest right way to do that in pandas 0.19.x

like image 606
Alex Zaitsev Avatar asked Dec 26 '16 11:12

Alex Zaitsev


People also ask

How to groupby and apply sort within groups of pandas Dataframe?

You can find out how to perform groupby and apply sort within groups of Pandas DataFrame by using DataFrame.Sort_values () and DataFrame.groupby () and apply () with lambda functions. In this article, I will explain how groupby and apply sort within groups of pandas DataFrame. 1. Quick Examples of Sort within Groups of Pandas DataFrame

What is groupby in pandas?

Pandas Groupby – Sort within groups Last Updated : 29 Aug, 2020 Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.

How do I Group a column in a pandas table?

Here’s the near-equivalent in Pandas: You call .groupby () and pass the name of the column you want to group on, which is "state". Then, you use ["last_name to specify the columns on which you want to perform the actual aggregation.

How to inspect a pandas groupby object and see the splitting?

One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in DataFrameGroupBy.__iter__ () and produces an iterator of (group, DataFrame) pairs for DataFrames:


2 Answers

Unfortunately there is no yet such method as DataFrameGroupBy.nlargest(), which would allow us to do the following:

df.groupby(...).nlargest(2, columns=['value'])

So here is a bit ugly, but working solution:

In [73]: df.set_index(df.index.normalize()).reset_index().sort_values(['index','value'], ascending=[1,0]).groupby('index').head(2)
Out[73]:
       index  uniq_id  value
0 2016-12-26        1    342
3 2016-12-26        8    114
4 2016-12-27        9    343
6 2016-12-27       16    124

PS i think there must be a better one...

UPDATE: if your DF wouldn't have duplicated index values, the following solution should work as well:

In [117]: df
Out[117]:
                     uniq_id  value
2016-12-26 11:03:10        1    342
2016-12-26 11:03:13        4      5
2016-12-26 12:03:13        5     14
2016-12-26 12:33:13        8    114    # <-- i've intentionally changed this index value
2016-12-27 11:03:10        9    343
2016-12-27 11:03:13       13      5
2016-12-27 12:03:13       16    124
2016-12-27 12:33:13       18    114    # <-- i've intentionally changed this index value

In [118]: df.groupby(pd.TimeGrouper('D')).apply(lambda x: x.nlargest(2, 'value')).reset_index(level=1, drop=1)
Out[118]:
            uniq_id  value
2016-12-26        1    342
2016-12-26        8    114
2016-12-27        9    343
2016-12-27       16    124
like image 140
MaxU - stop WAR against UA Avatar answered Oct 18 '22 16:10

MaxU - stop WAR against UA


df.set_index('uniq_id', append=True) \
    .groupby(df.index.date).value.nlargest(2) \
    .rename_axis([None, None, 'uniq_id']).reset_index(-1)


                                uniq_id  value
2016-12-26 2016-12-26 11:03:10        1    342
           2016-12-26 12:03:13        8    114
2016-12-27 2016-12-27 11:03:10        9    343
           2016-12-27 12:03:13       16    124
like image 27
piRSquared Avatar answered Oct 18 '22 17:10

piRSquared