Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: complex filtering with apply

Tags:

python

pandas

Lets suppose this dataframe, which I want to filter in such a way I iterate from the last index backwards until I find two consecutive 'a' = 0. Once that happens, the rest of the dataframe (including both zeros) shall be filtered:

    a
1   6.5
2   0
3   0
4   4.0
5   0
6   3.2

Desired result:

    a
4   4.0
5   0
6   3.2

My initial idea was ussing apply for filtering, and inside the apply function using shift(1) == 0 & shift(2) == 0, but based of that I could filter each row individually, but not returning false for the remaining rows after the double zero is found unless I use a global variable or something nasty like that.

Any smart way of doing this?

like image 200
Roman Rdgz Avatar asked Jan 26 '26 18:01

Roman Rdgz


1 Answers

You could do that with sort_index with ascending=False, cumsum and dropna:

In [89]: df[(df.sort_index(ascending=False) == 0).cumsum() < 2].dropna()
Out[89]:
     a
4  4.0
5  0.0
6  3.2

Step by step:

In [99]: df.sort_index(ascending=False)
Out[99]:
     a
6  3.2
5  0.0
4  4.0
3  0.0
2  0.0
1  6.5

In [100]: df.sort_index(ascending=False) == 0
Out[100]:
       a
6  False
5   True
4  False
3   True
2   True
1  False

In [101]: (df.sort_index(ascending=False) == 0).cumsum()
Out[101]:
   a
6  0
5  1
4  1
3  2
2  3
1  3

In [103]: (df.sort_index(ascending=False) == 0).cumsum() < 2
Out[103]:
       a
6   True
5   True
4   True
3  False
2  False
1  False

In [104]: df[(df.sort_index(ascending=False) == 0).cumsum() < 2]
Out[104]:
     a
1  NaN
2  NaN
3  NaN
4  4.0
5  0.0
6  3.2

EDIT

IIUC you could use something like that using pd.rolling_sum and first_valid_index if your index started from 1:

df_sorted = df.sort_index(ascending=False)
df[df_sorted[(pd.rolling_sum((df_sorted==0), window=2) == 2)].first_valid_index()+1:]

With the @jezrael example:

In [208]: df
Out[208]:
      a
1   6.5
2   0.0
3   0.0
4   7.0
5   0.0
6   0.0
7   0.0
8   4.0
9   0.0
10  0.0
11  3.2
12  5.0

df_sorted = df.sort_index(ascending=False)

In [210]: df[df_sorted[(pd.rolling_sum((df_sorted==0), window=2) == 2)].first_valid_index()+1:]
Out[210]:
      a
11  3.2
12  5.0
like image 130
Anton Protopopov Avatar answered Jan 28 '26 06:01

Anton Protopopov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!