Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby sequential values

I have no idea how to call this operation, so I couldn't really google anything, but here's what I'm trying to do:

I have this dataframe:

df = pd.DataFrame({"name": ["A", "B", "B", "B", "A", "A", "B"], "value":[3, 1, 2, 0, 5, 2, 3]})
df
  name  value
0    A      3
1    B      1
2    B      2
3    B      0
4    A      5
5    A      2
6    B      3

And I want to group it on df.name and apply a max function on df.values but only if the names are in sequence. So my desired result is as follows:

df.groupby_sequence("name")["value"].agg(max)
  name  value
0    A      3
1    B      2
2    A      5
3    B      3

Any clue how to do this?

like image 846
Jurgy Avatar asked Oct 31 '18 20:10

Jurgy


2 Answers

Using pandas, you can groupby when the name changes from row to row, using (df.name!=df.name.shift()).cumsum(), which essentially groups together consecutive names:

>>> df.groupby((df.name!=df.name.shift()).cumsum()).max().reset_index(drop=True)
  name  value
0    A      3
1    B      2
2    A      5
3    B      3
like image 155
sacuL Avatar answered Sep 28 '22 07:09

sacuL


Not exactly a pandas solution, but you could use groupby from itertools:

from operator import itemgetter

import pandas as pd
from itertools import groupby

df = pd.DataFrame({"name": ["A", "B", "B", "B", "A", "A", "B"], "value":[3, 1, 2, 0, 5, 2, 3]})

result = [max(group, key=itemgetter(1)) for k, group in groupby(zip(df.name, df.value), key=itemgetter(0))]

print(result)

Output

[('A', 3), ('B', 2), ('A', 5), ('B', 3)]
like image 37
Dani Mesejo Avatar answered Sep 28 '22 08:09

Dani Mesejo