If I have a monthly "points" dataframe, in which the values are from cumsum():
ID month1 month2 month3 month4
000 0 10 45 55
111 40 60 100 100
And I have a "buy" dataframe, which is basically whether there'll be a purchase in that month or not:
ID month1 month2 month3 month4
000 NO NO YES NO
111 NO YES NO YES
How do I make a new dataframe whose values satisfies the condition:
IF points > 40 AND buy == "YES"
THEN returns MAX(40, 0.8*points)
ELSE returns 0
the resulting dataframe should be:
ID month1 month2 month3 month4
000 0 0 40 0
111 0 48 0 41.6
ID 111's month4 value is 41.6 because it still got 12 points remaining from the previous months and added by another 40 from current month, so it's 52*0.8 = 41.6
The easiest would be to merge the two datasets by 'ID':
df = df1.merge(df2, on='ID')
And then use np.where:
df['month1_x'] = np.where((df['month1_x'] > 40) & (df['month1_y'] == 'YES'), MAX(40, 0.8*df['month1_x']), 0)
Try np.where and assign all columns:
Prepare:
df1 =pd.read_csv(io.StringIO('''ID month1 month2 month3 month4
000 0 10 45 55
111 40 60 100 100'''),sep='\s+')
df1
df2 = pd.read_csv(io.StringIO('''ID month1 month2 month3 month4
000 NO NO YES NO
111 NO YES NO YES '''),sep='\s+')
df2
df2 = df2.set_index('ID')
Code:
df = df1.set_index('ID')
condition = (df *0.8 > 40) & (df2== 'YES')
df[df.columns] = np.where(condition, df.values, 0)
df[df.columns] = np.where(df*0.8>0,df,np.nan)
ffill = df.ffill(axis=1) - df.ffill(axis=1).shift(1,axis=1)*0.8
df[df.columns] = np.where(((df.isna())|(ffill.isna())),df,ffill)
df = (df.fillna(0)*0.8).reset_index()
Output:
ID month1 month2 month3 month4
0 0 0.0 0.0 0.0 0.0
1 111 0.0 48.0 0.0 41.6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With