Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write code in a vectorized way instead of using loops?

I would like to write the following code in a vectorized way as the current code is pretty slow (and would like to learn Python best practices). Basically, the code is saying that if today's value is within 10% of yesterday's value, then today's value (in a new column) is the same as yesterday's value. Otherwise, today's value is unchanged:

def test(df):
    df['OldCol']=(100,115,101,100,99,70,72,75,78,80,110)
    df['NewCol']=df['OldCol']
    for i in range(1,len(df)-1):
        if df['OldCol'][i]/df['OldCol'][i-1]>0.9 and df['OldCol'][i]/df['OldCol'][i-1]<1.1:
            df['NewCol'][i]=df['NewCol'][i-1]
        else:
            df['NewCol'][i]=df['OldCol'][i]
    return df['NewCol']

The output should be the following:

    OldCol  NewCol
0      100     100
1      115     115
2      101     101
3      100     101
4       99     101
5       70      70
6       72      70
7       75      70
8       78      70
9       80      70
10     110     110

Can you please help?

I would like to use something like this but I did not manage to solve my issue:

def test(df):
    df['NewCol']=df['OldCol']
    cond=np.where((df['OldCol'].shift(1)/df['OldCol']>0.9) & (df['OldCol'].shift(1)/df['OldCol']<1.1))
    df['NewCol'][cond[0]]=df['NewCol'][cond[0]-1]     
    return df     
like image 752
crazyfrog Avatar asked Oct 30 '22 12:10

crazyfrog


1 Answers

A solution in three steps :

df['variation']=(df.OldCol/df.OldCol.shift())
df['gap']=~df.variation.between(0.9,1.1)
df['NewCol']=df.OldCol.where(df.gap).fillna(method='ffill')

For :

    OldCol  variation    gap  NewCol
0      100        nan   True     100
1      115       1.15   True     115
2      101       0.88   True     101
3      100       0.99  False     101
4       99       0.99  False     101
5       70       0.71   True      70
6       72       1.03  False      70
7       75       1.04  False      70
8       78       1.04  False      70
9       80       1.03  False      70
10     110       1.38   True     110

It seems to be 30x faster than loops on this exemple.

In one line :

x=df.OldCol;df['NewCol']=x.where(~(x/x.shift()).between(0.9,1.1)).fillna(method='ffill')
like image 180
B. M. Avatar answered Nov 15 '22 05:11

B. M.