In Python (Pandas/Numpy), How to subset a df using a condition and a specific chunk size?

Question

I have a df

A = pd.DataFrame([[1, 5, 2, 0], [2, 4, 4, 0], [3, 3, 1, 1], [4, 2, 2, 0], [5, 1, 4, 0], [2, 4, 4, 0], [3, 3, 1, 1], [4, 2, 2, 0], [5, 1, 4, 0]],
                 columns=['A', 'B', 'C', 'D'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9])

I want to be able to subset the dataframe according to the following rules: Select the rows which the column 'D' value is 1 and also include the two above them (Chunk Size = 3).

If I apply the rule in the df example, the output should be:

   A  B  C  D
1  1  5  2  0
2  2  4  4  0
3  3  3  1  1
5  5  1  4  0
6  2  4  4  0
7  3  3  1  1

Thanks

behzad.nouri · Accepted Answer

This will work with any chunk size:

>>> chunk, mask = 3, A['D'] == 1
>>> mask -= mask.shift(-chunk).fillna(0)
>>> A[mask[::-1].cumsum() > 0]
   A  B  C  D
1  1  5  2  0
2  2  4  4  0
3  3  3  1  1
5  5  1  4  0
6  2  4  4  0
7  3  3  1  1

In Python (Pandas/Numpy), How to subset a df using a condition and a specific chunk size?

Tags:

python

pandas

dataframe

numpy

subset

hernanavella

1 Answers

behzad.nouri

Recent Activity

Donate For Us

In Python (Pandas/Numpy), How to subset a df using a condition and a specific chunk size?

Tags:

python

pandas

dataframe

numpy

subset

hernanavella

1 Answers

behzad.nouri

Related questions

Recent Activity

Donate For Us