Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe filtering rows by column values

I have a Dataframe df

       Num1   Num2 
one       1      0
two       3      2
three     5      4
four      7      6
five      9      8

I want to filter rows that have value bigger than 3 in Num1 and smaller than 8 in Num2.

I tried this

df = df[df['Num1'] > 3 and df['Num2'] < 8]

but the error occurred.

ValueError: The truth value of a Series is ambiguous.

so I used

df = df[df['Num1'] > 3]
df = df[df['Num2'] < 8]

I think the code can be shorter.

Is there any other way?

like image 835
Seunghun Choi Avatar asked Jun 11 '17 09:06

Seunghun Choi


People also ask

How do you filter the rows based on the value of the column?

DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

How do you filter DataFrame on a column?

filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index. The items, like, and regex parameters are enforced to be mutually exclusive.


2 Answers

You need add () because operator precedence with bit-wise operator &:

df1 = df[(df['Num1'] > 3) & (df['Num2'] < 8)]
print (df1)
       Num1  Num2
three     5     4
four      7     6

Better explanation is here.

Or if need shortest code use query:

df1 = df.query("Num1 > 3 and Num2 < 8")
print (df1)
       Num1  Num2
three     5     4
four      7     6

df1 = df.query("Num1 > 3 &  Num2 < 8")
print (df1)
       Num1  Num2
three     5     4
four      7     6
like image 169
jezrael Avatar answered Oct 26 '22 03:10

jezrael


Yes, you can use the & operator:

df = df[(df['Num1'] > 3) & (df['Num2'] < 8)]
#                        ^ & operator

This is because and works on the truthiness value of the two operands, whereas the & operator can be defined on arbitrary data structures.

The brackets are mandatory here, because & binds shorter than > and <, so without brackets, Python would read the expression as df['Num1'] > (3 & df['Num2']) < 8.

Note that you can use the | operator as a logical or.

like image 20
Willem Van Onsem Avatar answered Oct 26 '22 03:10

Willem Van Onsem