Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is ternary operator implemented in Python

I understand that conditional expressions (or ternary operators) are lazy in Python. They represent conditional execution rather than conditional selection. In other words, only one of a or b is evaluated in the following:

c = a if condition else b

What I'm interested to know is how this is implemented internally. Does Python convert to an if statement as below and, if so, at what stage does this conversion occur?

if condition:
    c = a
else:
    c = b

Or is the ternary operator actually a distinct and separate expression, defined entirely separately? If so, can I access the CPython code for conditional expressions?

I've looked at the following which explain what the ternary operator does, but none of them make clear how they are implemented:

  • Does Python have a ternary conditional operator?
  • Putting a simple if-then-else statement on one line
  • python ? (conditional/ternary) operator for assignments
  • Is there an equivalent of C’s ”?:” ternary operator?
  • Conditional expressions

Edit: You can assume CPython reference implementation.

like image 944
jpp Avatar asked Sep 06 '18 10:09

jpp


2 Answers

Python doesn't have to convert anything, and couldn't if it wanted to.

The conditional expression is parsed by using the language grammar into an abstract syntax tree, which in turn is then compiled to bytecode. You can produce the AST by using the ast.parse() function:

>>> import ast
>>> ast.parse('c = a if condition else b').body[0]  # first statement in the tree
<_ast.Assign object at 0x10f05c550>
>>> ast.dump(ast.parse('c = a if condition else b').body[0])
"Assign(targets=[Name(id='c', ctx=Store())], value=IfExp(test=Name(id='condition', ctx=Load()), body=Name(id='a', ctx=Load()), orelse=Name(id='b', ctx=Load())))"

Note the ast.IfExp() node in the AST produced for the assignment; this is a dedicated node for conditional expressions. It has test, body and orelse parts to represent the 3 expressions that make up the condition, true and false parts. This is documented in the ast module Abstract Grammar section:

expr = [...]
     | [...]
     | IfExp(expr test, expr body, expr orelse)

This shows that the type of each element is another expr expression node.

The parse tree is then compiled to bytecode that uses the stack to conditionally jump to the right section based on the test; we can pass the AST produced by ast.parse() directly to the compile() function, after which the dis module lets us look at a human-friendly form of the bytecode produced by compilation:

>>> import dis
>>> dis.dis(compile(ast.parse('c = a if condition else b'), '', 'exec'))
  1           0 LOAD_NAME                0 (condition)
              2 POP_JUMP_IF_FALSE        8
              4 LOAD_NAME                1 (a)
              6 JUMP_FORWARD             2 (to 10)
        >>    8 LOAD_NAME                2 (b)
        >>   10 STORE_NAME               3 (c)
             12 LOAD_CONST               0 (None)
             14 RETURN_VALUE

So if the condition is false, the interpreter loop jumps forward to instruction 8, otherwise instructions 4 and 6 are executed, with instruction 6 jumping forward to instruction 10 (so past the else expression). The end result is that either instruction 4 or instruction 8 puts a new result on the top of the stack for STORE_NAME to move to a variable.

An if statement results in a different AST node, and the resulting bytecode happens to be very similar in that it too would use jumps. But the compiler treats them as distinct pieces of syntax, and it has to.

Expressions and statements are two very different fundamental building blocks of a programming language. Statements can contain expressions, but expressions can't contain statements, only other expressions. And expressions can produce a value (for the surrounding syntax to use), but statements can't. So Python has to treat conditional expressions very differently from statements, in that the grammar parser knows when to expect a statement and when an expression is allowed. If you transformed a conditional expression into a statement, you would not be able to ever use such an expression as part of a bigger expression!

Because an if statement is not an expression, it doesn't return a value (as only expressions can produce a value), and so the resulting bytecode would not produce a value on the top of the stack to be used by the surrounding Python code (there is no c = if condition : ...). if statements contain a condition expression, and a suite, which must always consist of more statements (there is such a thing as an 'expression statement' to let you put just an expression in a statement, such as 1 + 1 on a single line), and those statements can 'do stuff' like assignments or return from a function, but nothing they do would ever make if return something.

This is reflected in the AST node definition for if statements:

stmt =  [...]
      | [...]
      | If(expr test, stmt* body, stmt* orelse)

So for an If node, test is the only expression node, and body and orelse both consist of zero or more statements. The orelse part would hold any elif ...: tests as further If() nodes, or any other type of statement to form an unconditional else:. With zero-or-more elements, you can't expect a single result.

So this isn't unique to CPython, this applies to all Python implementations. The Python grammar is not an implementation detail.

like image 193
Martijn Pieters Avatar answered Oct 06 '22 23:10

Martijn Pieters


Does Python convert to an if statement as below

Almost.

import dis

def trenary():
    x = 'a' if 1 == 1 else 'b'

def normal_if():
    if 1 == 1:
        c = 'a'
    else:
        c = 'b'

print('trenary')
dis.dis(trenary)
print()
print('normal if')
dis.dis(normal_if)

This outputs:

trenary
 68           0 LOAD_CONST               1 (1)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               2 (==)
              6 POP_JUMP_IF_FALSE       12
              8 LOAD_CONST               2 ('a')
             10 JUMP_FORWARD             2 (to 14)
        >>   12 LOAD_CONST               3 ('b')
        >>   14 STORE_FAST               0 (x)
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

normal if
 71           0 LOAD_CONST               1 (1)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               2 (==)
              6 POP_JUMP_IF_FALSE       14

 72           8 LOAD_CONST               2 ('a')
             10 STORE_FAST               0 (c)
             12 JUMP_FORWARD             4 (to 18)

 74     >>   14 LOAD_CONST               3 ('b')
             16 STORE_FAST               0 (c)
        >>   18 LOAD_CONST               0 (None)
             20 RETURN_VALUE

These look almost the same, except for the location of JUMP_FORWARD and an additional STORE_FAST as pointed out by @L3viathan.

We also get almost the same execution times (with a negligible difference):

from timeit import Timer

print(min(Timer(trenary).repeat(5000, 5000)))
print(min(Timer(normal_if).repeat(5000, 5000)))
# 0.0006442809999998023
# 0.0006442799999994975

As to when this conversion happens, I'd assume sometime during "compilation" to bytecode.

like image 45
DeepSpace Avatar answered Oct 07 '22 00:10

DeepSpace