Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter pandas dataframe rows by multiple column values

I have a pandas dataframe containing rows with numbered columns:

    1  2  3  4  5
a   0  0  0  0  1
b   1  1  2  1  9             
c   2  2  2  2  2
d   5  5  5  5  5
e   8  9  9  9  9

How can I filter out the rows where a subset of columns are all above or below a certain value?

So, for example: I want to remove all rows where columns 1 to 3 all values are not > 3. In the above, that would leave me with only rows d and e.

The columns I am filtering and the value I am checking against are both arguments.

I've tried a few things, this is the closest I've gotten:

df[df[range(1,3)]>3]

Any ideas?

like image 450
quannabe Avatar asked Jun 09 '16 22:06

quannabe


People also ask

How do I select rows from a DataFrame based on multiple column values?

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.

How do I filter DataFrame by multiple column values?

Use the syntax new_DataFrame = DataFrame[(DataFrame[column]==criteria1) operator (DataFrame[column2]==criteria2)] , where operator is & or | , to filter a pandas. DataFrame by multiple columns.

How can pandas select rows based on multiple conditions?

You can get pandas. Series of bool which is an AND of two conditions using & . Note that == and ~ are used here as the second condition for the sake of explanation, but you can use !=

How do you filter data based on two columns in pandas?

To filter pandas DataFrame by multiple columns. When we filter a DataFrame by one column, we simply compare that column values against a specific condition but when it comes to filtering of DataFrame by multiple columns, we need to use the AND (&&) Operator to match multiple columns with multiple conditions.


2 Answers

I used loc and all in this function:

def filt(df, cols, thresh):
    return df.loc[(df[cols] > thresh).all(axis=1)]

filt(df, [1, 2, 3], 3)

   1  2  3  4  5
d  5  5  5  5  5
e  8  9  9  9  9
like image 184
piRSquared Avatar answered Nov 14 '22 23:11

piRSquared


You can achieve this without using apply:

In [73]:
df[(df.ix[:,0:3] > 3).all(axis=1)]

Out[73]:
   1  2  3  4  5
d  5  5  5  5  5
e  8  9  9  9  9

So this slices the df to just the first 3 columns using ix and then we compare against the scalar 3 and then call all(axis=1) to create a boolean series to mask the index

like image 43
EdChum Avatar answered Nov 15 '22 00:11

EdChum