What I wish to do is create groups based on the last column (key) such that in the first group I get for example the row 1. For the second group I get rows 1 and 2. And in the 3rd group I get the last two rows.
I tried to do it with pandas groupby method, grouped = df1.groupby('Key')
but as expected it returns only 2 groups.
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3','A4'],
...: 'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
...: 'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
...: 'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
'Key': ['K0', 'K1', 'K1', 'K0', 'K0']},
...: index=[0, 1, 2, 3, 4])
df1
Out[89]:
A B C D Key
0 A0 B0 C0 D0 K0
1 A1 B1 C1 D1 K1
2 A2 B2 C2 D2 K1
3 A3 B3 C3 D3 K0
4 A4 B4 C4 D4 K0
You can use shift
and cumsum
then groupby
grouped = df.groupby((df.Key != df.Key.shift()).cumsum())
grouped.groups
{1: Int64Index([0], dtype='int64'),
2: Int64Index([1, 2], dtype='int64'),
3: Int64Index([3, 4], dtype='int64')}
To show why this works:
(df.Key != df.Key.shift()).cumsum()
0 1
1 2
2 2
3 3
4 3
Name: Key, dtype: int32
Each time Key
changes, the cumulative sum of the column will increase by one, even if the value has been seen before. This allows you to groupby this new series, which only groups sequential repeated keys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With