I was trying to parse the function definition for the python language with PLY. I am encountering issues related to the indentation. For instance for a for statement, I would like to be able to know when the block ends. I read the python grammar here: http://docs.python.org/2/reference/grammar.html And the grammar for this part is:
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
I don't know how to describe the INDENT and DEDENT tokens with PLY. I was trying something like:
def t_indentation(t):
r' |\t'
#some special treatment for the indentation.
But it seems that PLY consider that regexes with spaces match the empty string and does not build the lexer... Even if I would have managed to have the INDENT token I am not sure about the way to get the DEDENT one...
Is there a way to do that with PLY?
Python Language Indentation How Indentation is Parsed At the beginning, the stack contains just the value 0, which is the leftmost position. Whenever a nested block begins, the new indentation level is pushed on the stack, and an "INDENT" token is inserted into the token stream which is passed to the parser.
Python parsing is done using various ways such as the use of parser module, parsing using regular expressions, parsing using some string methods such as split() and strip(), parsing using pandas such as reading CSV file to text by using read. csv, etc.
You have to use states to parse INDENT and UNDENT.
example of parsing python like language
PLY includes in its examples one for a subset of Python to demonstrate how to handle indentation:
https://github.com/dabeaz/ply/tree/1321375e013425958ea090b55aecae0a4b7face6/example/GardenSnake
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With