Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Apply filter on a subset of columns

Lets say I have data like this:

df = pd.DataFrame({'category': ["blue","blue","blue", "blue","green"], 'val1': [5, 3, 2, 2, 5], 'val2':[1, 3, 2, 2, 5]})
print(df)

  category  val1  val2
0     blue     5     1
1     blue     3     3
2     blue     2     2
3     blue     2     2
4    green     5     5

I want to get the rows where any value is > 3. For my example here that has only two columns I can just do

df.loc[(df['val1'] > 3) | (df['val2'] > 3)]

    category    val1    val2
0   blue    5   1
4   green   5   5

Now let's say I have a dataset with a large amount of (numeric) columns, and I want to get all rows where the value of any of the numeric columns meets a condition (like for example being > 3). Is there a way to check a condition over multiple columns without having to chain them with | ?

So for example let's say I have a dataframe with n columns named val1 to valn and I want all rows where any of the values in val1 to valn is > 3. Is there a better way / shorter way of doing it than

df.loc[(df['val1'] > 3) | (df['val2'] > 3) | ... | (df['valn'] > 3)]

?

like image 832
Christian O. Avatar asked Jan 24 '23 13:01

Christian O.


1 Answers

You can use df.any() as below. This will work for any number of columns (we ignore the first column 'category' as it's no numeric):

res=df[(df.iloc[:,1:] >3).any(axis=1)]

Result for your current dataframe is:

>>>print(res)

  category  val1  val2
0     blue     5     1
4    green     5     5
like image 149
IoaTzimas Avatar answered Jan 27 '23 03:01

IoaTzimas