How to select top n row from each group after group by in pandas?

Tags:

python

pandas

I have a pandas dataframe with following shape

 open_year, open_month, type, col1, col2, ....

I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)

freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']

Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?

I can use nlargest but I'm missing the type

freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)

but I'm missing the column type

688

asked May 18 '18 16:05

HHH

1 Answers

I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—

(freq_df.sort_values('count', ascending=False)
        .groupby(['open_year','open_month'], sort=False).head(5)
)

133

answered Oct 23 '22 20:10

cs95

Related questions
                            
                                Tensorflow: Reverse flattening of a tensor
                            
                                Sqlalchemy setup for postgresql with timescaledb extension [duplicate]
                            
                                Match multiple keys values to database entry in TinyDB?
                            
                                pyspark: Could not find valid SPARK_HOME
                            
                                Call __exit__ on all members of a class
                            
                                How to get accumulative maximum indices with numpy in Python?
                            
                                Check if class property has a setter
                            
                                Groupwise sorting in pandas
                            
                                Plotly: Australia Choropleth map
                            
                                Bug writing audio using custom video writer library
                            
                                Is it necessary to close session after tensorflow InteractiveSession()
                            
                                how to run the code before the app.run() in flask?
                            
                                Pyspark CountVectorizer and Word Frequency in a corpus
                            
                                Setting a Plotly Dash dcc.dropdown value dynamically
                            
                                PyMySQL Access Denied "using password (no") but using password
                            
                                How to use two models in Tensorflow object Detection API
                            
                                Params for functions in jupyter lab w/ Python
                            
                                How to convert CIDR to IP ranges using python3?
                            
                                Tensorflow parsing and reshaping float list in Dataset.map()
                            
                                Reenable urllib3 warnings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With