Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test if any column of a pandas DataFrame satisfies a condition

I got a DataFrame with lots of columns. Now I have a condition that tests some of those columns if any of that column-set is different to zero.

Is there any more elegant way to apply that condition to a subset of columns? My current code is:

df['indicator'] = (
    (df['col_1'] != 0) | 
    (df['col_2'] != 0) | 
    (df['col_3'] != 0) | 
    (df['col_4'] != 0) | 
    (df['col_5'] != 0)
)

I was looking for something like this pseudo code:

columns = ['col_1', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
df['indicator'] = df.any(columns, lambda value: value != 0)
like image 734
Matthias Avatar asked Apr 04 '18 13:04

Matthias


2 Answers

ne is the method form of !=. I use that so that pipelining any looks nicer. I use any(axis=1) to find if any are true in a row.

df['indicator'] = df[columns].ne(0).any(axis=1)
like image 87
piRSquared Avatar answered Oct 19 '22 23:10

piRSquared


In this particular case you could also check whether the sum of corresponding columns !=0:

df['indicator'] = df[columns].prod(axis=1).ne(0)

PS @piRSquared's solution is much more generic...

like image 22
MaxU - stop WAR against UA Avatar answered Oct 20 '22 00:10

MaxU - stop WAR against UA