I have a dataframe like the following:
    A
1   1000
2   1000
3   1001
4   1001
5   10
6   1000
7   1010
8   9
9   10
10  6
11  999
12  10110
13  10111
14  1000 
I am trying to clean my dataframe in the following way: For every row having more value than 1.5 times the previous row value or less than 0.5 times the previous row value, drop it. But If the previous row is a to-drop row, comparison must be made with the immediate previous NON-to-drop row. (For example Index 9, 10 or 13 in my dataframe) So the final dataframe should be like:
    A
1   1000
2   1000
3   1001
4   1001
6   1000
7   1010
11  999
14  1000
My dataframe is really huge so performance is appreciated.
I'll pass a series to a function and yield the index values for which rows satisfy the conditions.
def f(s):
    it = s.iteritems()
    i, v = next(it)
    yield i                          # Yield the first one
    for j, x in it:
        if .5 * v <= x <= 1.5 * v:
            yield j                  # Yield the ones that satisfy
            v = x                    # Update the comparative value
df.loc[list(f(df.A))]                # Use `loc` with index values
                                     # yielded by my generator
       A
1   1000
2   1000
3   1001
4   1001
6   1000
7   1010
11   999
14  1000
                        One alternative could be to use itertools.accumulate to push forward the last valid value and then filter out the values that are different from the original, e.g:
from itertools import accumulate
def change(x, y, pct=0.5):
    if pct * x <= y <= (1 + pct) * x:
        return y
    return x
# create a mask filtering out the values that are different from the original A
mask = (df.A == list(accumulate(df.A, change)))
print(df[mask])
Output
       A
1   1000
2   1000
3   1001
4   1001
6   1000
7   1010
11   999
14  1000
Just to get an idea, see how the accumulated column (change) compares to the original side-by-side:
        A  change
1    1000    1000
2    1000    1000
3    1001    1001
4    1001    1001
5      10    1001
6    1000    1000
7    1010    1010
8       9    1010
9      10    1010
10      6    1010
11    999     999
12  10110     999
13  10111     999
14   1000    1000
Update
To make it in the function call do:
mask = (df.A == list(accumulate(df.A, lambda x, y : change(x, y, pct=0.5))))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With