Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing rows that don't fit the repeating sequence in pandas dataframe

I have a pandas dataframe that looks like this:

    A   B   C   D
0   1   2   3   0
1   4   5   6   1
2   7   8   9   2
3   10  10  10  0
4   10  10  10  1
5   1   2   3   0
6   4   5   6   1
7   7   8   8   2

I would like to remove all the set of rows that, in column 'D', are not -> 0,1,2 in this specific order;

The new dataframe I would like to obtain should look like this:

    A   B   C   D
0   1   2   3   0
1   4   5   6   1
2   7   8   9   2
3   1   2   3   0
4   4   5   6   1
5   7   8   8   2

.. because after row 3 and 4, row 5 did not have 2 in column 'D'.

like image 231
AjWinston Avatar asked Dec 22 '25 22:12

AjWinston


1 Answers

A possible solution based on numpy:

w = np.lib.stride_tricks.sliding_window_view(df['D'], 3)
idx = np.flatnonzero((w == (0,1,2)).all(1)) # starting indexes of seq 0, 1, 2
df.iloc[(idx[:, None] + np.arange(3)).ravel()].reset_index(drop=True)

This uses numpy’s sliding_window_view to create a rolling 3-element view over the D column, then checks which windows match the sequence (0,1,2) by comparing element-wise and applying all along axis 1; the indices of the matching windows are obtained with flatnonzero. These starting indices are then expanded into full triplets with broadcasting, and the corresponding rows are selected from the dataframe using iloc, before reindexing cleanly with reset_index.

Output:

   A  B  C  D
0  1  2  3  0
1  4  5  6  1
2  7  8  9  2
3  1  2  3  0
4  4  5  6  1
5  7  8  8  2

Intermediates:

# w == (0,1,2)

array([[ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [ True,  True, False],
       [False, False, False],
       [ True,  True,  True]])

# idx[:, None]

array([[0],
       [5]])

# + np.arange(3)

array([[0, 1, 2],
       [5, 6, 7]])

# .ravel()

array([0, 1, 2, 5, 6, 7])

To turn this solution more general

seq = (0,1,2)
n = len(seq)

then:

  • .sliding_window_view(..., n)
  • w == seq
  • np.arange(n)

(thanks @wjandrea)

like image 52
PaulS Avatar answered Dec 24 '25 12:12

PaulS