When parsing a freeform language like C, it is easy for the parser to determine when several expressions are related to one another simply by looking at the symbols emitted by the parser. For example, in the code
if (x == 5) { a = b; c = d; }
The parser can tell that a = b;
and c = d;
are part of the same block statement because they're surrounded by braces. This could easily be encoded as a CFG using something like this:
STMT ::= IF_STMT | EXPR; | BLOCK_STMT | STMT STMT IF_STMT ::= if ( EXPR ) STMT BLOCK_STMT ::= { STMT }
In Python and other whitespace-sensitive languages, though, it's not as easy to do this because the structure of the statements can only be inferred from their absolute position, which I don't think can easily be encoded into a CFG. For example, the above code in Python would look like this:
if x == 5: a = b c = d
Try as I might, I can't see a way to write a CFG that would accept this, because I can't figure out how to encode "two statements at the same level of nesting" into a CFG.
How do Python parsers group statements as they do? Do they rely on a scanner that automatically inserts extra tokens denoting starts and ends of statements? Do they produce a rough AST for the program, then have an extra pass that assembles statements based on their indentation? Is there a clever CFG for this problem that I'm missing? Or do they use a more powerful parser than a standard LL(1) or LALR(1) parser that's able to take whitespace level into account?
Indentation is a very important concept of Python because without proper indenting the Python code, you will end up seeing IndentationError and the code will not get compiled.
Python parsing is done using various ways such as the use of parser module, parsing using regular expressions, parsing using some string methods such as split() and strip(), parsing using pandas such as reading CSV file to text by using read.
Benefits of Indentation in Python In Python, it's used for grouping, making the code automatically beautiful. Python indentation rules are very simple. Most of the Python IDEs automatically indent the code for you, so it's very easy to write the properly indented code.
The indentations are handled with two "pseudo tokens" - INDENT and DEDENT. There are some details here. For more information, you should look at the source for the python tokeniser and parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With