Lets say I have data like this:
df = pd.DataFrame({'category': ["blue","blue","blue", "blue","green"], 'val1': [5, 3, 2, 2, 5], 'val2':[1, 3, 2, 2, 5]})
print(df)
category val1 val2
0 blue 5 1
1 blue 3 3
2 blue 2 2
3 blue 2 2
4 green 5 5
I want to get the rows where any value is > 3. For my example here that has only two columns I can just do
df.loc[(df['val1'] > 3) | (df['val2'] > 3)]
category val1 val2
0 blue 5 1
4 green 5 5
Now let's say I have a dataset with a large amount of (numeric) columns, and I want to get all rows where the value of any of the numeric columns meets a condition (like for example being > 3). Is there a way to check a condition over multiple columns without having to chain them with |
?
So for example let's say I have a dataframe with n columns named val1 to valn and I want all rows where any of the values in val1 to valn is > 3. Is there a better way / shorter way of doing it than
df.loc[(df['val1'] > 3) | (df['val2'] > 3) | ... | (df['valn'] > 3)]
?
You can use df.any() as below. This will work for any number of columns (we ignore the first column 'category' as it's no numeric):
res=df[(df.iloc[:,1:] >3).any(axis=1)]
Result for your current dataframe is:
>>>print(res)
category val1 val2
0 blue 5 1
4 green 5 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With