I have a below dataframe
id action
================
10 CREATED
10 111
10 222
10 333
10 DONE
10 222
10 UPDATED
777 CREATED
10 333
10 DONE
I would like to create a new column "check" that would be based on data in previous rows in dataframe:
Output:
id action check
================
10 CREATED
10 111
10 222
10 333
10 DONE C
10 222
10 UPDATED
777 CREATED
10 333
10 DONE U
I tried to use multiple if conditions but it did not work for me. Can you pls help?
Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.
The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row.
Consider a more sophisticated sample dataframe for illustration:
# print(df)
id action
10 CREATED
10 111
10 222
10 333
10 DONE
10 222
10 UPDATED
777 CREATED
10 333
10 DONE
777 DONE
10 CREATED
10 DONE
11 UPDATED
11 DONE
Use:
transformer = lambda s: s[(s.eq('CREATED') | s.eq('UPDATED')).cumsum().idxmax()]
grouper = (
lambda g: g.groupby(
g['action'].eq('DONE').cumsum().shift().fillna(0))['action']
.transform(transformer)
)
df['check'] = df.groupby('id').apply(grouper).droplevel(0).str[0]
df.loc[df['action'].ne('DONE'), 'check'] = ''
Explanation:
First we group the dataframe on id
and apply a grouper
function, then for each grouped dataframe we further group this grouped dataframe by the first occurence of DONE
in the action column, so essentially we are splitting this grouped dataframe in multiple parts where each part separated from the other by the DONE
value in action column. then we use transformer
lambda function to transform each of this spllitted dataframes according to the first value (CREATED
or UPDATED
) that preceds the DONE
value in action column.
Result:
# print(df)
id action check
0 10 CREATED
1 10 111
2 10 222
3 10 333
4 10 DONE C
5 10 222
6 10 UPDATED
7 777 CREATED
8 10 333
9 10 DONE U
10 777 DONE C
11 10 CREATED
12 10 DONE C
13 11 UPDATED
14 11 DONE U
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With