I've been trying to figure out how I can return just the first group, after I apply groupby.
My code looks like this:
gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum()
What I want is for that first first group to output. I've been trying the get_group method but it keeps failing (maybe because I am grouping by multiple columns?)
Here is an example of my output:
col1 col2 col3 col4 'sum'
1 34 green 10 0.0
yellow 30 1.5
orange 20 1.1
2 89 green 10 3.0
yellow 5 0.0
orange 10 1.0
What I want to be returned is just this:
col1 col2 col3 col4 'sum'
1 34 green 10 0.0
yellow 30 1.5
orange 20 1.1
(Note the 'sum' column I just added here to make it clear what that last column was, but pandas does not actually name that column)
The pandas. groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
You can using get_group
with groups
g=df.groupby(['col1','col2'])
g.get_group((list(g.groups)[0])).groupby(['col3','col4'])['col5'].sum()
for group_id, group_df in df.groupby(['col1', 'col2', 'col3', 'col4']):
break
iterate over your groupby object and stop after the first iteration. The variables group_id and group_df will contain your first group.
Kind of an ugly workaround but works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With