I have a dataframe that is grouped by state and aggregated to total revenue where sector and name are ignored. I would now like to break the underlying dataset out to show state, sector, name and the top 2 by revenue in a certain order(i have a created an index from a previous dataframe that lists states in a certain order). Using the below example, I would like to use my sorted index (Kentucky, California, New York) that lists only the top two results per state (in previously stated order by Revenue): Dataset:
State Sector Name Revenue
California 1 Tom 10
California 2 Harry 20
California 3 Roger 30
California 2 Jim 40
Kentucky 2 Bob 15
Kentucky 1 Roger 25
Kentucky 3 Jill 45
New York 1 Sally 50
New York 3 Harry 15
End Goal Dataframe:
State Sector Name Revenue
Kentucky 3 Jill 45
Kentucky 1 Roger 25
California 2 Jim 40
California 3 Roger 30
New York 1 Sally 50
New York 3 Harry 15
You could use a groupby
in conjunction with apply
:
df.groupby('State').apply(lambda grp: grp.nlargest(2, 'Revenue'))
Output:
Sector Name Revenue
State State
California California 2 Jim 40
California 3 Roger 30
Kentucky Kentucky 3 Jill 45
Kentucky 1 Roger 25
New York New York 1 Sally 50
New York 3 Harry 15
Then you can drop the first level of the MultiIndex
to get the result you're after:
df.index = df.index.droplevel()
Output:
Sector Name Revenue
State
California 2 Jim 40
California 3 Roger 30
Kentucky 3 Jill 45
Kentucky 1 Roger 25
New York 1 Sally 50
New York 3 Harry 15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With