I have just noticed this: <pre class="prettyprint"><code>df[df.condition1 & df.condition2] df[(df.condition1) & (df.condition2)] </code></pre> Why does the output of these two lines differ? <hr> I cannot share the exact data but I am gonna try to provide as much detail as I can: <pre class="prettyprint"><code>df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly </code></pre> <hr> Thanks to @jezrael and @ayhan, here is what happened, and let me use the example provided by @jezael: <pre class="prettyprint"><code>df = pd.DataFrame({'col1':[True, False, False, False], 'col2':[4, np.nan, np.nan, 1]}) print (df) col1 col2 0 True 4.0 1 False NaN 2 False NaN 3 False 1.0 </code></pre> If we take a look at row 3: <pre class="prettyprint"><code> col1 col2 3 False 1.0 </code></pre> and the way I wrote the condition: <pre class="prettyprint"><code>df.col1 == False & df.col2.isnull() # is equivalent to False == False & False </code></pre> Because the <code>&</code> sign has higher priority than <code>==</code>, without brackets <code>False == False & False</code> is equivalent of: <pre class="prettyprint"><code>False == (False & False) print(False == (False & False)) # prints True </code></pre> With brackets: <pre class="prettyprint"><code>print((False == False) & False) # prints False </code></pre> I think it is a bit easier to illustrate this problem with numbers: <pre class="prettyprint"><code>print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False print(5 == (5 & 1)) # prints False, same reason as above print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1 </code></pre> So lessons learned: always add brackets!!! I wish I can split the answer points to both @jezrael and @ayhan :(

There is no difference between <code>df[condition1 & condition2]</code> and <code>df[(condition1) & (condition2)]</code>. The difference arises when you write an expression and the operator <code>&</code> takes precedence: <pre class="prettyprint"><code>df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc')) df Out: a b c 0 5 0 3 1 3 7 9 2 3 5 2 3 4 7 6 4 8 8 1 condition1 = df['a'] > 3 condition2 = df['b'] < 5 df[condition1 & condition2] Out: a b c 0 5 0 3 df[(condition1) & (condition2)] Out: a b c 0 5 0 3 </code></pre> However, if you type it like this you'll see an error: <pre class="prettyprint"><code>df[df['a'] > 3 & df['b'] < 5] Traceback (most recent call last): File "<ipython-input-7-9d4fd21246ca>", line 1, in <module> df[df['a'] > 3 & df['b'] < 5] File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). </code></pre> This is because <code>3 & df['b']</code> is evaluated first (this corresponds to <code>False & df.col2.isnull()</code> in your example). So you need to group the conditions in parentheses: <pre class="prettyprint"><code>df[(df['a'] > 3) & (df['b'] < 5)] Out[8]: a b c 0 5 0 3 </code></pre>

pandas logical and operator with and without brackets produces different results [duplicate]

Tags:

python

pandas

I have just noticed this:

df[df.condition1 & df.condition2]
df[(df.condition1) & (df.condition2)]

Why does the output of these two lines differ?

I cannot share the exact data but I am gonna try to provide as much detail as I can:

df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect
df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly

Thanks to @jezrael and @ayhan, here is what happened, and let me use the example provided by @jezael:

df = pd.DataFrame({'col1':[True, False, False, False],
                   'col2':[4, np.nan, np.nan, 1]})

print (df)
    col1  col2
0   True   4.0
1  False   NaN
2  False   NaN
3  False   1.0

If we take a look at row 3:

    col1  col2
3  False   1.0

and the way I wrote the condition:

df.col1 == False & df.col2.isnull() # is equivalent to False == False & False

Because the & sign has higher priority than ==, without brackets False == False & False is equivalent of:

False == (False & False)
print(False == (False & False)) # prints True

With brackets:

print((False == False) & False) # prints False

I think it is a bit easier to illustrate this problem with numbers:

print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False
print(5 == (5 & 1)) # prints False, same reason as above
print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1

So lessons learned: always add brackets!!!

I wish I can split the answer points to both @jezrael and @ayhan :(

236

asked Feb 20 '17 06:02

Cheng

1 Answers

There is no difference between df[condition1 & condition2] and df[(condition1) & (condition2)]. The difference arises when you write an expression and the operator & takes precedence:

df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))    
df
Out: 
   a  b  c
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

condition1 = df['a'] > 3
condition2 = df['b'] < 5

df[condition1 & condition2]
Out: 
   a  b  c
0  5  0  3

df[(condition1) & (condition2)]
Out: 
   a  b  c
0  5  0  3

However, if you type it like this you'll see an error:

df[df['a'] > 3 & df['b'] < 5]
Traceback (most recent call last):

  File "<ipython-input-7-9d4fd21246ca>", line 1, in <module>
    df[df['a'] > 3 & df['b'] < 5]

  File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This is because 3 & df['b'] is evaluated first (this corresponds to False & df.col2.isnull() in your example). So you need to group the conditions in parentheses:

df[(df['a'] > 3) & (df['b'] < 5)]
Out[8]: 
   a  b  c
0  5  0  3

100

answered Sep 22 '22 05:09

ayhan

Related questions
                            
                                Stack columns above value labels in pandas pivot table
                            
                                Putting/Updating item in DynamoDB fails for the UpdateExpression syntax
                            
                                html5lib requires setuptools version 18.5 or above; please upgrade before installing (you have 0.6)
                            
                                Pyspark RDD .filter() with wildcard
                            
                                In python why is the "object" class all in lower case instead of first letter in capitals?
                            
                                Index the first and the last n elements of a list
                            
                                How to resolve "IndexError: too many indices for array"
                            
                                matplotlib hist2d colormap for null pixels
                            
                                Pytest and Django settings runtime changes
                            
                                Flask Alchemy with Marshmallow returns empty JSON
                            
                                What is the difference between the apply() function and a function call using the object of the class?
                            
                                Getting the target of a symbolic link with pathlib
                            
                                Speed up Pandas cummin/cummax
                            
                                Edit element in browser with python selenium
                            
                                Can I read multiple files into a Spark Dataframe from S3, passing over nonexistent ones?
                            
                                How to query with raw SQL using Session or engine
                            
                                Python unittest framework: Test description
                            
                                How to use string as input for csv reader without storing it to file
                            
                                Extracting nearest lat-lon and time value from netcdf using xarray
                            
                                How can I get pytest to ignore Test* classes that don't subclass unittest?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With