Let's say we have a dict that will always have keys first_name and last_name but they may be equal to None. <pre class="prettyprint"><code>{ 'first_name': None, 'last_name': 'Bloggs' } </code></pre> We want to save the first name if it is passed in or save it as an empty string if None is passed in. <pre class="prettyprint"><code>first_name = account['first_name'] if account['first_name'] else "" </code></pre> vs <pre class="prettyprint"><code>first_name = account['first_name'] or "" </code></pre> Both of these work, however, what is the difference behind the scenes? Is one more efficient than the other?

What is the difference between the two following expressions? <blockquote> <pre class="prettyprint"><code>first_name = account['first_name'] if account['first_name'] else "" </code></pre> vs <pre class="prettyprint"><code>first_name = account['first_name'] or "" </code></pre> </blockquote> The primary difference is that the first, in Python, is the conditional expression, <blockquote> The expression <code>x if C else y</code> first evaluates the condition, <code>C</code> rather than <code>x</code>. If <code>C</code> is true, <code>x</code> is evaluated and its value is returned; otherwise, <code>y</code> is evaluated and its value is returned. </blockquote> while the second uses the boolean operation: <blockquote> The expression <code>x or y</code> first evaluates <code>x</code>; if <code>x</code> is true, its value is returned; otherwise, <code>y</code> is evaluated and the resulting value is returned. </blockquote> Note that the first may require two key lookups versus the second, which only requires one key lookup. This lookup is called subscript notation: <pre class="prettyprint"><code>name[subscript_argument] </code></pre> Subscript notation exercises the <code>__getitem__</code> method of the object referenced by <code>name</code>. It requires both the name and the subscript argument to be loaded. Now, in the context of the question, if it tests as <code>True</code> in a boolean context (which a non-empty string does, but <code>None</code> does not) it will require a second (redundant) loading of both the dictionary and the key for the conditional expression, while simply returning the first lookup for the boolean <code>or</code> operation. Therefore I would expect the second, the boolean operation, to be slightly more efficient in cases where the value is not <code>None</code>. <h3>Abstract Syntax Tree (AST) breakdown</h3> Others have compared the bytecode generated by both expressions. However, the AST represents the first breakdown of the language as parsed by the interpreter. The following AST demonstrates that the second lookup likely involves more work (note I have formatted the output for easier parsing): <pre class="prettyprint"><code>>>> print(ast.dump(ast.parse("account['first_name'] if account['first_name'] else ''").body[0])) Expr( value=IfExp( test=Subscript(value=Name(id='account', ctx=Load()), slice=Index(value=Str(s='first_name')), ctx=Load()), body=Subscript(value=Name(id='account', ctx=Load()), slice=Index(value=Str(s='first_name')), ctx=Load()), orelse=Str(s='') )) </code></pre> versus <pre class="prettyprint"><code>>>> print(ast.dump(ast.parse("account['first_name'] or ''").body[0])) Expr( value=BoolOp( op=Or(), values=[ Subscript(value=Name(id='account', ctx=Load()), slice=Index(value=Str(s='first_name')), ctx=Load()), Str(s='')] ) ) </code></pre> <h3>Bytecode analysis</h3> Here we see that the bytecode for the conditional expression is much longer. This usually bodes poorly for relative performance in my experience. <pre class="prettyprint"><code>>>> import dis >>> dis.dis("d['name'] if d['name'] else ''") 1 0 LOAD_NAME 0 (d) 2 LOAD_CONST 0 ('name') 4 BINARY_SUBSCR 6 POP_JUMP_IF_FALSE 16 8 LOAD_NAME 0 (d) 10 LOAD_CONST 0 ('name') 12 BINARY_SUBSCR 14 RETURN_VALUE >> 16 LOAD_CONST 1 ('') 18 RETURN_VALUE </code></pre> For the boolean operation, it's almost half as long: <pre class="prettyprint"><code>>>> dis.dis("d['name'] or ''") 1 0 LOAD_NAME 0 (d) 2 LOAD_CONST 0 ('name') 4 BINARY_SUBSCR 6 JUMP_IF_TRUE_OR_POP 10 8 LOAD_CONST 1 ('') >> 10 RETURN_VALUE </code></pre> Here I would expect the performance to be much quicker relative to the other. Therefore, let's see if there's much difference in performance then. <h3>Performance</h3> Performance is not very important here, but sometimes I have to see for myself: <pre class="prettyprint"><code>def cond(name=False): d = {'name': 'thename' if name else None} return lambda: d['name'] if d['name'] else '' def bool_op(name=False): d = {'name': 'thename' if name else None} return lambda: d['name'] or '' </code></pre> We see that when the name is in the dictionary, the boolean operation is about 10% faster than the conditional. <pre class="prettyprint"><code>>>> min(timeit.repeat(cond(name=True), repeat=10)) 0.11814919696189463 >>> min(timeit.repeat(bool_op(name=True), repeat=10)) 0.10678509017452598 </code></pre> However, when the name is not in the dictionary, we see that there is almost no difference: <pre class="prettyprint"><code>>>> min(timeit.repeat(cond(name=False), repeat=10)) 0.10031125508248806 >>> min(timeit.repeat(bool_op(name=False), repeat=10)) 0.10030031995847821 </code></pre> <h3>A note on correctness</h3> In general, I would prefer the <code>or</code> boolean operation to the conditional expression - with the following caveats: <ul> <li>The dictionary is guaranteed to only have non-empty strings or <code>None</code>.</li> <li>Performance here is critical.</li> </ul> In the case where either the above is not true, I would prefer the following for correctness: <pre class="prettyprint"><code>first_name = account['first_name'] if first_name is None: first_name = '' </code></pre> The upsides are that <ul> <li>the lookup is done one time, </li> <li>the check for <code>is None</code> is quite fast,</li> <li>the code is explicitly clear, and</li> <li>the code is easily maintainable by any Python programmer.</li> </ul> This should also not be any less performant: <pre class="prettyprint"><code>def correct(name=False): d = {'name': 'thename' if name else None} def _correct(): first_name = d['name'] if first_name is None: first_name = '' return _correct </code></pre> We see that we get quite competitive performance when the key is there: <pre class="prettyprint"><code>>>> min(timeit.repeat(correct(name=True), repeat=10)) 0.10948465298861265 >>> min(timeit.repeat(cond(name=True), repeat=10)) 0.11814919696189463 >>> min(timeit.repeat(bool_op(name=True), repeat=10)) 0.10678509017452598 </code></pre> when the key is not in the dictionary, it is not quite as good though: <pre class="prettyprint"><code>>>> min(timeit.repeat(correct(name=False), repeat=10)) 0.11776355793699622 >>> min(timeit.repeat(cond(name=False), repeat=10)) 0.10031125508248806 >>> min(timeit.repeat(bool_op(name=False), repeat=10)) 0.10030031995847821 </code></pre> <h3>Conclusion</h3> The difference between the conditional expression and the boolean operation is two versus one lookups respectively on a <code>True</code> condition, making the boolean operation more performant. For correctness's sake, however, do the lookup one time, check for identity to <code>None</code> with <code>is None</code>, and then reassign to the empty string in that case.

if-else vs "or" operation for None-check

Tags:

python

Let's say we have a dict that will always have keys first_name and last_name but they may be equal to None.

{
    'first_name': None,
    'last_name': 'Bloggs'
}

We want to save the first name if it is passed in or save it as an empty string if None is passed in.

first_name = account['first_name'] if account['first_name'] else ""

first_name = account['first_name'] or ""

Both of these work, however, what is the difference behind the scenes? Is one more efficient than the other?

926

asked Aug 11 '18 19:08

user7692855

1 Answers

What is the difference between the two following expressions?

first_name = account['first_name'] if account['first_name'] else ""

first_name = account['first_name'] or ""

The primary difference is that the first, in Python, is the conditional expression,

The expression x if C else y first evaluates the condition, C rather than x. If C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.

while the second uses the boolean operation:

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

Note that the first may require two key lookups versus the second, which only requires one key lookup.

This lookup is called subscript notation:

name[subscript_argument]

Subscript notation exercises the __getitem__ method of the object referenced by name.

It requires both the name and the subscript argument to be loaded.

Now, in the context of the question, if it tests as True in a boolean context (which a non-empty string does, but None does not) it will require a second (redundant) loading of both the dictionary and the key for the conditional expression, while simply returning the first lookup for the boolean or operation.

Therefore I would expect the second, the boolean operation, to be slightly more efficient in cases where the value is not None.

Abstract Syntax Tree (AST) breakdown

Others have compared the bytecode generated by both expressions.

However, the AST represents the first breakdown of the language as parsed by the interpreter.

The following AST demonstrates that the second lookup likely involves more work (note I have formatted the output for easier parsing):

>>> print(ast.dump(ast.parse("account['first_name'] if account['first_name'] else ''").body[0]))
Expr(
    value=IfExp(
        test=Subscript(value=Name(id='account', ctx=Load()),
                       slice=Index(value=Str(s='first_name')), ctx=Load()),
        body=Subscript(value=Name(id='account', ctx=Load()),
                       slice=Index(value=Str(s='first_name')), ctx=Load()),
        orelse=Str(s='')
))

versus

>>> print(ast.dump(ast.parse("account['first_name'] or ''").body[0]))
Expr(
    value=BoolOp(
        op=Or(),
        values=[
            Subscript(value=Name(id='account', ctx=Load()),
                      slice=Index(value=Str(s='first_name')), ctx=Load()),
            Str(s='')]
    )
)

Bytecode analysis

Here we see that the bytecode for the conditional expression is much longer. This usually bodes poorly for relative performance in my experience.

>>> import dis   
>>> dis.dis("d['name'] if d['name'] else ''")
  1           0 LOAD_NAME                0 (d)
              2 LOAD_CONST               0 ('name')
              4 BINARY_SUBSCR
              6 POP_JUMP_IF_FALSE       16
              8 LOAD_NAME                0 (d)
             10 LOAD_CONST               0 ('name')
             12 BINARY_SUBSCR
             14 RETURN_VALUE
        >>   16 LOAD_CONST               1 ('')
             18 RETURN_VALUE

For the boolean operation, it's almost half as long:

>>> dis.dis("d['name'] or ''")
  1           0 LOAD_NAME                0 (d)
              2 LOAD_CONST               0 ('name')
              4 BINARY_SUBSCR
              6 JUMP_IF_TRUE_OR_POP     10
              8 LOAD_CONST               1 ('')
        >>   10 RETURN_VALUE

Here I would expect the performance to be much quicker relative to the other.

Therefore, let's see if there's much difference in performance then.

Performance

Performance is not very important here, but sometimes I have to see for myself:

def cond(name=False):
    d = {'name': 'thename' if name else None}
    return lambda: d['name'] if d['name'] else ''

def bool_op(name=False):
    d = {'name': 'thename' if name else None}
    return lambda: d['name'] or ''

We see that when the name is in the dictionary, the boolean operation is about 10% faster than the conditional.

>>> min(timeit.repeat(cond(name=True), repeat=10))
0.11814919696189463
>>> min(timeit.repeat(bool_op(name=True), repeat=10))
0.10678509017452598

However, when the name is not in the dictionary, we see that there is almost no difference:

>>> min(timeit.repeat(cond(name=False), repeat=10))
0.10031125508248806
>>> min(timeit.repeat(bool_op(name=False), repeat=10))
0.10030031995847821

A note on correctness

In general, I would prefer the or boolean operation to the conditional expression - with the following caveats:

The dictionary is guaranteed to only have non-empty strings or None.
Performance here is critical.

In the case where either the above is not true, I would prefer the following for correctness:

first_name = account['first_name']
if first_name is None:
    first_name = ''

The upsides are that

the lookup is done one time,
the check for is None is quite fast,
the code is explicitly clear, and
the code is easily maintainable by any Python programmer.

This should also not be any less performant:

def correct(name=False):
    d = {'name': 'thename' if name else None}
    def _correct():
        first_name = d['name']
        if first_name is None:
            first_name = ''
    return _correct

We see that we get quite competitive performance when the key is there:

>>> min(timeit.repeat(correct(name=True), repeat=10))
0.10948465298861265
>>> min(timeit.repeat(cond(name=True), repeat=10))
0.11814919696189463
>>> min(timeit.repeat(bool_op(name=True), repeat=10))
0.10678509017452598

when the key is not in the dictionary, it is not quite as good though:

>>> min(timeit.repeat(correct(name=False), repeat=10))
0.11776355793699622
>>> min(timeit.repeat(cond(name=False), repeat=10))
0.10031125508248806
>>> min(timeit.repeat(bool_op(name=False), repeat=10))
0.10030031995847821

Conclusion

The difference between the conditional expression and the boolean operation is two versus one lookups respectively on a True condition, making the boolean operation more performant.

For correctness's sake, however, do the lookup one time, check for identity to None with is None, and then reassign to the empty string in that case.

127

answered Nov 16 '22 00:11

Russia Must Remove Putin

Related questions
                            
                                Django's Double Underscore
                            
                                SqlAlchemy won't accept datetime.datetime.now value in a DateTime column
                            
                                How to train a neural network to supervised data set using pybrain black-box optimization?
                            
                                Scipy: lognormal fitting
                            
                                Writing wav file in Python with wavfile.write from SciPy
                            
                                Python windows path slash [duplicate]
                            
                                What is the point of .ix indexing for pandas Series
                            
                                Utility of parameter 'out' in numpy functions
                            
                                Mock side effect only X number of times
                            
                                What to do if I want 3D spline/smooth interpolation of random unstructured data?
                            
                                ImportError: No module named extern
                            
                                What is the best way to remove accents with Apache Spark dataframes in PySpark?
                            
                                How to deal with "divide by zero" with pandas dataframes when manipulating columns? [duplicate]
                            
                                How can I print Hindi sentences(unicode) on image in Python?
                            
                                aiohttp+sqlalchemy: Can't reconnect until invalid transaction is rolled back
                            
                                Why is reading multiple files at the same time slower than reading sequentially?
                            
                                In sklearn.decomposition.PCA, why are components_ negative?
                            
                                Push notification from python to android
                            
                                Keras: How is Accuracy Calculated for Multi-Label Classification?
                            
                                How replace transparent with a color in pillow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With