Pandas get topmost n records within each group

People also ask

How do you get the maximum values of each group in a pandas?

To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.

Did you try df.groupby('id').head(2)

Ouput generated:

>>> df.groupby('id').head(2)
       id  value
id             
1  0   1      1
   1   1      2 
2  3   2      1
   4   2      2
3  7   3      1
4  8   4      1

(Keep in mind that you might need to order/sort before, depending on your data)

EDIT: As mentioned by the questioner, use df.groupby('id').head(2).reset_index(drop=True) to remove the multindex and flatten the results.

>>> df.groupby('id').head(2).reset_index(drop=True)
    id  value
0   1      1
1   1      2
2   2      1
3   2      2
4   3      1
5   4      1

Since 0.14.1, you can now do nlargest and nsmallest on a groupby object:

In [23]: df.groupby('id')['value'].nlargest(2)
Out[23]: 
id   
1   2    3
    1    2
2   6    4
    5    3
3   7    1
4   8    1
dtype: int64

There's a slight weirdness that you get the original index in there as well, but this might be really useful depending on what your original index was.

If you're not interested in it, you can do .reset_index(level=1, drop=True) to get rid of it altogether.

(Note: From 0.17.1 you'll be able to do this on a DataFrameGroupBy too but for now it only works with Series and SeriesGroupBy.)

Sometimes sorting the whole data ahead is very time consuming. We can groupby first and doing topk for each group:

g = df.groupby(['id']).apply(lambda x: x.nlargest(topk,['value'])).reset_index(drop=True)

df.groupby('id').apply(lambda x : x.sort_values(by = 'value', ascending = False).head(2).reset_index(drop = True))

Here sort values ascending false gives similar to nlargest and True gives similar to nsmallest.
The value inside the head is the same as the value we give inside nlargest to get the number of values to display for each group.
reset_index is optional and not necessary.

Related questions
                            
                                Python type hinting without cyclic imports
                            
                                Generate a heatmap in MatPlotLib using a scatter data set
                            
                                Writing a Python list of lists to a csv file
                            
                                How do I remove leading whitespace in Python?
                            
                                How can I extract all values from a dictionary in Python?
                            
                                Python Flask, how to set content type
                            
                                How to annotate types of multiple return values?
                            
                                Why do we need to call zero_grad() in PyTorch?
                            
                                How to filter rows in pandas by regex
                            
                                How do I catch a numpy warning like it's an exception (not just for testing)?
                            
                                How to use pip with Python 3.x alongside Python 2.x
                            
                                How do I draw a grid onto a plot in Python?
                            
                                How to keep a Python script output window open?
                            
                                How to deep copy a list?
                            
                                How can I break up this long line in Python?
                            
                                How can I print variable and string on same line in Python?
                            
                                Set Matplotlib colorbar size to match graph
                            
                                How to ignore deprecation warnings in Python
                            
                                How do I get the path of the current executed file in Python?
                            
                                How can I specify working directory for popen

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas get topmost n records within each group

Tags:

python

pandas

greatest-n-per-group

window-functions

top-n

People also ask

Recent Activity

Donate For Us