Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add the sequential numbers if certain condition met in Pandas

Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.

Original_dataset:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 1 35
1 C 1 1 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 1 89
3 F 0 1 19

Required_Dataset:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 2 35
1 C 1 3 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 2 89
3 F 0 2 19

Conditions:

  1. For first occurrence of ID, the cluster should be 1.
  2. If status = 1 or Gap > 28, then cluster needs to increase by 1 based on patient ID (see row 1-C and 2-B --- as ID changes, the Cluster remains 1 as it is the first occurrence of the particular ID).
  3. If the condition is not satisfied, it needs to hold the previous cluster number. (Can refer the final row).

The code which I have tried is:

Original_dataset.loc[((new_df4['gap'] > 28) | (Original_dataset['status'] == 1)),'Cluster'] = Original_dataset['Cluster'] + 1
like image 212
user20933477 Avatar asked Sep 02 '25 02:09

user20933477


1 Answers

First set 1 if greater GAP like 28 or if Status is 1 or first duplicated value of ID in DataFrame.loc, then use lambda function with GroupBy.cumsum and GroupBy.ffill

m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()

df.loc[m | m1 | m2, 'Cluster1'] = 1

f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print (df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2
like image 81
jezrael Avatar answered Sep 07 '25 16:09

jezrael