So I understand that:
The end of a logical line is represented by the token NEWLINE
This means the way Python's grammar is defined the only way to end a logical line is with a \n
token.
The same goes for physical lines (rather an EOL, which is the EOL of the platform you're using when writing the file but nevertheless converted to a universal \n
by Python.
A logical line can or cannot be equivalent to one or more physical lines, but usually it's one, and most of the times it's one if you write clean code.
In the sense that:
foo = 'some_value' # 1 logical line = 1 physical
foo, bar, baz = 'their', 'corresponding', 'values' # 1 logical line = 1 physical
some_var, another_var = 10, 10; print(some_var, another_var); some_fn_call()
# the above is still still 1 logical line = 1 physical line
# because ; is not a terminator per se but a delimiter
# since Python doesn't use EBNF exactly but rather a modified form of BNF
# p.s one should never write code as the last line, it's just for educational purposes
Without showing examples of how 1 logical is equivalent to > 1 physical, my question is the following part from the docs:
Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax (e.g., between statements in compound statements)
But what does this even mean? I understand the list of the compound statements, be them: if, while, for, etc. they are all made up of one or multiple clauses and each clause, in turn is made up of a header and a suite. The suite is made up of one or multiple statements, let's take an example to be more specific:
So the if statement is something like this according to the grammar (excluding the elifs and else clauses):
if_stmt ::= "if" expression ":" suite
where the suite and its subsequent statements:
suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement ::= stmt_list NEWLINE | compound_stmt
stmt_list ::= simple_stmt (";" simple_stmt)* [";"]
so this means that if you want you can choose (given by "|") your suite to be 1 of 2 ways:
on the same line:
disadvantages: not pythonic and you cannot have another compound statement that introduces a new block (like a func def, another if, etc)
advatanges: one liner I guess
example:
if 'truthy_string': foo, bar, baz = 1, 2, 3; print('whatever'); call_some_fn();
introduce a new block:
advantages: all, and the proper way to do it
example:
if 'truthy_value':
first_stmt = 5
second_stmt = 10
a, b, c = 1, 2, 3
func_call()
result = inception(nested(calls(one_param), another_param), yet_another))
but I don't see how
Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax
What I see above is a suite, which is a block of code controlled by the if clause, and in turn, that suite, is made up of logical, independent lines (statements), where each logical line is one physical line (coincidentally). I don't see how one logical line can cross the boundaries (which basically is just a fancy word for the end, the limit, which is newline), I don't see how one statement can cross those boundaries and span into the next statement, or maybe I'm really confused and have everything mixed up, but if someone can please explain.
Thank you for your time in advance.
The lexer splits the code into tokens (keywords, identifiers, numbers, etc.), and the parser assembles the tokens into an abstract syntax tree. Most of the white space magic is in the lexer, which emits three special tokens: NEWLINE , INDENT , and DEDENT .
Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.
In Python strings, the backslash "\" is a special character, also called the "escape" character. It is used in representing certain whitespace characters: "\t" is a tab, "\n" is a newline, and "\r" is a carriage return. Conversely, prefixing a special character with "\" turns it into an ordinary character.
Fortunately there is a Full Grammar specification in the Python documentation.
A statement is defined in that specification as:
stmt: simple_stmt | compound_stmt
And a logical line is delimited by NEWLINE
(that's not in the specification but based on your question).
Okay, let's go through this, what's the specification for a
simple_stmt
:simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
Okay now it goes into several different paths and it probably doesn't make sense to go through all of them separately but based on the specification a simple_stmt
could cross logical line boundaries if any of the small_stmt
s contains a NEWLINE
(currently they don't but could).
Apart from that only theoretical possibility there is actually the
compound_stmt
:compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated | async_stmt
[...]
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
[...]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
I picked only the if
statement and suite
because it already suffices. The if
statement including elif
and else
and all of the content in these is one statement (a compound statement). And because it may contain NEWLINE
s (if the suite
isn't just a simple_stmt
) it already fulfills the requirement of "a statement that crosses logical line boundaries".
if
(schematic):if 1:
100
200
would be:
if_stmt
|---> test --> 1
|---> NEWLINE
|---> INDENT
|---> expr_stmt --> 100
|---> NEWLINE
|---> expr_stmt --> 200
|---> NEWLINE
|---> DEDENT
And all of this belongs to the if statement (and it's not just a block "controlled" by the if
or while
, ...).
if
with parser
, symbol
and token
A way to visualize that would be using the built-in parser
, token
and symbol
modules (really, I haven't known about this modules before I wrote the answer):
import symbol
import parser
import token
s = """
if 1:
100
200
"""
st = parser.suite(s)
def recursive_print(inp, level=0):
for idx, item in enumerate(inp):
if isinstance(item, int):
print('.'*level, symbol.sym_name.get(item, token.tok_name.get(item, item)), sep="")
elif isinstance(item, list):
recursive_print(item, level+1)
else:
print('.'*level, repr(item), sep="")
recursive_print(st.tolist())
Actually I cannot explain most of the parser
result but it shows (if you remove a lot of unnecessary lines) that the suite
including it's newlines really belongs to the if_stmt
. Indentation represents the "depth" of the parser at a specific point.
file_input
.stmt
..compound_stmt
...if_stmt
....NAME
....'if'
....test
.........expr
...................NUMBER
...................'1'
....COLON
....suite
.....NEWLINE
.....INDENT
.....stmt
...............expr
.........................NUMBER
.........................'100'
.......NEWLINE
.....stmt
...............expr
.........................NUMBER
.........................'200'
.......NEWLINE
.....DEDENT
.NEWLINE
.ENDMARKER
That could probably be made much more beautiful but I hope it serves as illustration even in it's current form.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With