I have a list that looks like this:
var1 var2 count
A abc 4
A abc 3
A abc 2
A abc 1
A abc 1
B abc 7
B abc 5
B abc 2
B abc 1
B abc 1
C abc 4
C abc 3
C abc 2
C abc 1
C abc 1
....
I want to create a new dataframe with top 3 'count' results from each group. It should look like this:
var1 var2 count
A abc 4
A abc 3
A abc 2
B abc 7
B abc 5
B abc 2
C abc 4
C abc 3
C abc 2
....
Is there a convenient way to do this in Python using head()?
Solution with set_index
, groupby
and SeriesGroupBy.nlargest
:
df = df.set_index('var2').groupby("var1")['count'].nlargest(3).reset_index()
print (df)
var1 var2 count
0 A abc 4
1 A abc 3
2 A abc 2
3 B abc 7
4 B abc 5
5 B abc 2
6 C abc 4
7 C abc 3
8 C abc 2
If the count column has been sorted in descending order, then you can just use groupby.head
to take the first three rows from each group:
df.groupby("var1").head(3)
Otherwise, you can group data frame by var1
and use nlargest
to retrieve the three rows with top 3 counts:
df.groupby("var1", group_keys=False).apply(lambda g: g.nlargest(3, "count"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With