I have following dataframe:
uniq_id value
2016-12-26 11:03:10 001 342
2016-12-26 11:03:13 004 5
2016-12-26 12:03:13 005 14
2016-12-26 12:03:13 008 114
2016-12-27 11:03:10 009 343
2016-12-27 11:03:13 013 5
2016-12-27 12:03:13 016 124
2016-12-27 12:03:13 018 114
And i need get top N records for each day sorted by value. Something like this (for N=2):
2016-12-26 001 342
008 114
2016-12-27 009 343
016 124
Please suggest right way to do that in pandas 0.19.x
You can find out how to perform groupby and apply sort within groups of Pandas DataFrame by using DataFrame.Sort_values () and DataFrame.groupby () and apply () with lambda functions. In this article, I will explain how groupby and apply sort within groups of pandas DataFrame. 1. Quick Examples of Sort within Groups of Pandas DataFrame
Pandas Groupby – Sort within groups Last Updated : 29 Aug, 2020 Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.
Here’s the near-equivalent in Pandas: You call .groupby () and pass the name of the column you want to group on, which is "state". Then, you use ["last_name to specify the columns on which you want to perform the actual aggregation.
One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in DataFrameGroupBy.__iter__ () and produces an iterator of (group, DataFrame) pairs for DataFrames:
Unfortunately there is no yet such method as DataFrameGroupBy.nlargest()
, which would allow us to do the following:
df.groupby(...).nlargest(2, columns=['value'])
So here is a bit ugly, but working solution:
In [73]: df.set_index(df.index.normalize()).reset_index().sort_values(['index','value'], ascending=[1,0]).groupby('index').head(2)
Out[73]:
index uniq_id value
0 2016-12-26 1 342
3 2016-12-26 8 114
4 2016-12-27 9 343
6 2016-12-27 16 124
PS i think there must be a better one...
UPDATE: if your DF wouldn't have duplicated index values, the following solution should work as well:
In [117]: df
Out[117]:
uniq_id value
2016-12-26 11:03:10 1 342
2016-12-26 11:03:13 4 5
2016-12-26 12:03:13 5 14
2016-12-26 12:33:13 8 114 # <-- i've intentionally changed this index value
2016-12-27 11:03:10 9 343
2016-12-27 11:03:13 13 5
2016-12-27 12:03:13 16 124
2016-12-27 12:33:13 18 114 # <-- i've intentionally changed this index value
In [118]: df.groupby(pd.TimeGrouper('D')).apply(lambda x: x.nlargest(2, 'value')).reset_index(level=1, drop=1)
Out[118]:
uniq_id value
2016-12-26 1 342
2016-12-26 8 114
2016-12-27 9 343
2016-12-27 16 124
df.set_index('uniq_id', append=True) \
.groupby(df.index.date).value.nlargest(2) \
.rename_axis([None, None, 'uniq_id']).reset_index(-1)
uniq_id value
2016-12-26 2016-12-26 11:03:10 1 342
2016-12-26 12:03:13 8 114
2016-12-27 2016-12-27 11:03:10 9 343
2016-12-27 12:03:13 16 124
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With