I have a pandas dataframe containing rows with numbered columns:
1 2 3 4 5
a 0 0 0 0 1
b 1 1 2 1 9
c 2 2 2 2 2
d 5 5 5 5 5
e 8 9 9 9 9
How can I filter out the rows where a subset of columns are all above or below a certain value?
So, for example: I want to remove all rows where columns 1 to 3 all values are not > 3. In the above, that would leave me with only rows d and e.
The columns I am filtering and the value I am checking against are both arguments.
I've tried a few things, this is the closest I've gotten:
df[df[range(1,3)]>3]
Any ideas?
You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.
Use the syntax new_DataFrame = DataFrame[(DataFrame[column]==criteria1) operator (DataFrame[column2]==criteria2)] , where operator is & or | , to filter a pandas. DataFrame by multiple columns.
You can get pandas. Series of bool which is an AND of two conditions using & . Note that == and ~ are used here as the second condition for the sake of explanation, but you can use !=
To filter pandas DataFrame by multiple columns. When we filter a DataFrame by one column, we simply compare that column values against a specific condition but when it comes to filtering of DataFrame by multiple columns, we need to use the AND (&&) Operator to match multiple columns with multiple conditions.
I used loc
and all
in this function:
def filt(df, cols, thresh):
return df.loc[(df[cols] > thresh).all(axis=1)]
filt(df, [1, 2, 3], 3)
1 2 3 4 5
d 5 5 5 5 5
e 8 9 9 9 9
You can achieve this without using apply
:
In [73]:
df[(df.ix[:,0:3] > 3).all(axis=1)]
Out[73]:
1 2 3 4 5
d 5 5 5 5 5
e 8 9 9 9 9
So this slices the df to just the first 3 columns using ix
and then we compare against the scalar 3
and then call all(axis=1)
to create a boolean series to mask the index
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With