I have no idea how to call this operation, so I couldn't really google anything, but here's what I'm trying to do:
I have this dataframe:
df = pd.DataFrame({"name": ["A", "B", "B", "B", "A", "A", "B"], "value":[3, 1, 2, 0, 5, 2, 3]})
df
name value
0 A 3
1 B 1
2 B 2
3 B 0
4 A 5
5 A 2
6 B 3
And I want to group it on df.name
and apply a max
function on df.values
but only if the names are in sequence. So my desired result is as follows:
df.groupby_sequence("name")["value"].agg(max)
name value
0 A 3
1 B 2
2 A 5
3 B 3
Any clue how to do this?
Using pandas
, you can groupby when the name changes from row to row, using (df.name!=df.name.shift()).cumsum()
, which essentially groups together consecutive names:
>>> df.groupby((df.name!=df.name.shift()).cumsum()).max().reset_index(drop=True)
name value
0 A 3
1 B 2
2 A 5
3 B 3
Not exactly a pandas solution, but you could use groupby from itertools:
from operator import itemgetter
import pandas as pd
from itertools import groupby
df = pd.DataFrame({"name": ["A", "B", "B", "B", "A", "A", "B"], "value":[3, 1, 2, 0, 5, 2, 3]})
result = [max(group, key=itemgetter(1)) for k, group in groupby(zip(df.name, df.value), key=itemgetter(0))]
print(result)
Output
[('A', 3), ('B', 2), ('A', 5), ('B', 3)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With