I have a pandas dataframe with following shape
open_year, open_month, type, col1, col2, ....
I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)
freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']
Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?
I can use nlargest
but I'm missing the type
freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)
but I'm missing the column type
Select first N Rows from a Dataframe using head() function In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.
You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.
I'd recommend sorting your counts in descending order first, and you can call GroupBy.head
after—
(freq_df.sort_values('count', ascending=False)
.groupby(['open_year','open_month'], sort=False).head(5)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With