Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

a solution for filtering some rows of data based on condition in pandas

Tags:

I have the following example data, and I'd like to filter a piece of data, when (col1 = 'A' and col2 = '0') we want to keep rows until next (col1 = 'A').
I want to do using pandas dataframe but I don't know how it is.

df = pd.DataFrame({'col1': ['A', 'B', 'C'],  'col2': [0, 1]}) 

For example, we have this data

col1 col2
 A    0
 C
 A    1 
 B
 C
 A    1 
 B
 B
 C
 A    0 
 B 
 C
 A    1 
 B 
 C
 C 

The result I want to achieve is:

col1 col2 
 A    0 
 C 
 A    0 
 B 
 C 

Thank you very much

like image 426
user13651815 Avatar asked May 31 '20 09:05

user13651815


People also ask

How do I filter out rows in Pandas DataFrame?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

Which function is used to filter rows based conditions?

Syntax. The FILTER function filters an array based on a Boolean (True/False) array. Notes: An array can be thought of as a row of values, a column of values, or a combination of rows and columns of values.


1 Answers

We first groupby row blocks starting with 'A' and then propagate the first value of col2 to all rows of the group. From this result we take all rows with 0 in col2.

 df[df.groupby(df.col1.eq('A').cumsum()).col2.transform('first').eq(0)]

Sample data:

df = pd.DataFrame({'col1': list('ACABCABBCABCABCC'),
                   'col2': [0, None, 1, None, None, 1, None, None, None, 0, None, None, 1, None, None, None]}
                 ).astype({'col2': 'Int32'})

Result:

   col1  col2
0     A     0
1     C  <NA>
9     A     0
10    B  <NA>
11    C  <NA>
like image 90
Stef Avatar answered Sep 29 '22 11:09

Stef