I would like to write the following code in a vectorized way as the current code is pretty slow (and would like to learn Python best practices). Basically, the code is saying that if today's value is within 10% of yesterday's value, then today's value (in a new column) is the same as yesterday's value. Otherwise, today's value is unchanged:
def test(df):
df['OldCol']=(100,115,101,100,99,70,72,75,78,80,110)
df['NewCol']=df['OldCol']
for i in range(1,len(df)-1):
if df['OldCol'][i]/df['OldCol'][i-1]>0.9 and df['OldCol'][i]/df['OldCol'][i-1]<1.1:
df['NewCol'][i]=df['NewCol'][i-1]
else:
df['NewCol'][i]=df['OldCol'][i]
return df['NewCol']
The output should be the following:
OldCol NewCol
0 100 100
1 115 115
2 101 101
3 100 101
4 99 101
5 70 70
6 72 70
7 75 70
8 78 70
9 80 70
10 110 110
Can you please help?
I would like to use something like this but I did not manage to solve my issue:
def test(df):
df['NewCol']=df['OldCol']
cond=np.where((df['OldCol'].shift(1)/df['OldCol']>0.9) & (df['OldCol'].shift(1)/df['OldCol']<1.1))
df['NewCol'][cond[0]]=df['NewCol'][cond[0]-1]
return df
A solution in three steps :
df['variation']=(df.OldCol/df.OldCol.shift())
df['gap']=~df.variation.between(0.9,1.1)
df['NewCol']=df.OldCol.where(df.gap).fillna(method='ffill')
For :
OldCol variation gap NewCol
0 100 nan True 100
1 115 1.15 True 115
2 101 0.88 True 101
3 100 0.99 False 101
4 99 0.99 False 101
5 70 0.71 True 70
6 72 1.03 False 70
7 75 1.04 False 70
8 78 1.04 False 70
9 80 1.03 False 70
10 110 1.38 True 110
It seems to be 30x faster than loops on this exemple.
In one line :
x=df.OldCol;df['NewCol']=x.where(~(x/x.shift()).between(0.9,1.1)).fillna(method='ffill')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With