Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.
Original_dataset:
ID | Name | Status | Cluster | Gap |
---|---|---|---|---|
1 | A | 0 | 1 | 15 |
1 | B | 1 | 1 | 35 |
1 | C | 1 | 1 | 03 |
2 | B | 0 | 1 | 26 |
2 | C | 0 | 1 | 16 |
3 | A | 1 | 1 | 65 |
3 | C | 0 | 1 | 89 |
3 | F | 0 | 1 | 19 |
Required_Dataset:
ID | Name | Status | Cluster | Gap |
---|---|---|---|---|
1 | A | 0 | 1 | 15 |
1 | B | 1 | 2 | 35 |
1 | C | 1 | 3 | 03 |
2 | B | 0 | 1 | 26 |
2 | C | 0 | 1 | 16 |
3 | A | 1 | 1 | 65 |
3 | C | 0 | 2 | 89 |
3 | F | 0 | 2 | 19 |
Conditions:
The code which I have tried is:
Original_dataset.loc[((new_df4['gap'] > 28) | (Original_dataset['status'] == 1)),'Cluster'] = Original_dataset['Cluster'] + 1
First set 1
if greater GAP like 28
or if Status
is 1
or first duplicated value of ID
in DataFrame.loc
, then use lambda function with GroupBy.cumsum
and GroupBy.ffill
m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()
df.loc[m | m1 | m2, 'Cluster1'] = 1
f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print (df)
ID Name Status Cluster Gap Cluster1
0 1 A 0 1 15 1
1 1 B 1 1 35 2
2 1 C 1 1 3 3
3 2 B 0 1 26 1
4 2 C 0 1 16 1
5 3 A 1 1 65 1
6 3 C 0 1 89 2
7 3 F 0 1 19 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With