how to use previous N values in column in if statement

Question

I have a dataframe df:

df=pd.DataFrame([[47,55,47,50,200], [33,37,30,25,100],[61,65,54,57,300],[25,26,21,22,400], [25,29,23,28,410],[28,34,32,30,430],[32,31,30,28,1000]], columns=['open','high','low','close','volume'])
print(df)

  open high low close volume
0   47  55  47  50  200
1   33  37  30  25  100
2   61  65  54  57  300
3   25  26  21  22  400
4   25  29  23  28  410
5   28  34  32  30  430
6   32  31  30  28  1000

I want to replace outliers with 75th percentile from volume column using formula:

if df['volume'] > (3IQR + vol_q3):

3IQR is IQR*3 of volume column

and vol_q3 is 75th percentile of last N values from volume(In this case last 4 values).

The code I wrote is as below:

from collections import deque
import pandas as pd
import numpy as np

vol_q=deque()

q1 = df['volume'].quantile(0.25)
q3 = df['volume'].quantile(0.75)
iqr_3 = 3*(q3 - q1)

for idx, rows in df.iterrows():
    if idx < 5:
        vol_q.append(rows['volume'])
    else :
        vol_q.popleft()
        vol_q.append(rows['volume'])

    vol_q3 = np.percentile(list(vol_q), 75)

    if rows['volume'] > (iqr_3 + vol_q3):
        rows['volume'] = q3

Output :

    open high low close volume
0   47  55  47  50  200
1   33  37  30  25  100
2   61  65  54  57  300
3   25  26  21  22  400
4   25  29  23  28  410
5   28  34  32  30  430
6   32  31  30  28  420

It's working but it's too slow for the amount of data I have. Is there any other way to implement it faster? How can I use previous N values using apply?

Any suggestions are welcome. Thanks

John Zwinck · Accepted Answer

v = df.volume # other columns not relevant to question
q = v.rolling(4).quantile(0.75) # 75th percentile of last 4
r = v.where(v <= iqr_3 + q, q3)

q is the vectorized rolling quantile, which is fast to compute with no loops. r is the result, which is a bit hard to verify from your question because your example data seems to include no values extreme enough to trigger the condition, but I think you see the idea.

how to use previous N values in column in if statement

Tags:

python

pandas

numpy

outliers

Sociopath

1 Answers

John Zwinck

Recent Activity

Donate For Us

how to use previous N values in column in if statement

Tags:

python

pandas

numpy

outliers

Sociopath

1 Answers

John Zwinck

Related questions

Recent Activity

Donate For Us