Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select top n row from each group after group by in pandas?

Tags:

python

pandas

I have a pandas dataframe with following shape

 open_year, open_month, type, col1, col2, ....

I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)

freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']

Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?

I can use nlargest but I'm missing the type

freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)

but I'm missing the column type

like image 688
HHH Avatar asked May 18 '18 16:05

HHH


People also ask

How do I select top and rows in Pandas?

Select first N Rows from a Dataframe using head() function In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.

How do you get top 20 rows in Pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.


1 Answers

I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—

(freq_df.sort_values('count', ascending=False)
        .groupby(['open_year','open_month'], sort=False).head(5)
)
like image 133
cs95 Avatar answered Oct 23 '22 20:10

cs95