Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between NL and NEWLINE in tokenizer.py?

I'm trying to rewrite tokenizer.py for Java so I can parse Python in Java, but I don't understand the difference between NL and NEWLINE in the source. They seem to mean the same thing, but if they did then where are there two tokens?

like image 679
Octavia Togami Avatar asked Feb 11 '23 09:02

Octavia Togami


2 Answers

Some googling provided this answer:

Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines.

as stated from here:

https://docs.python.org/2/library/tokenize.html

and more in depth information can be found here:

Python 2 newline tokens in tokenize module

like image 69
marsh Avatar answered Feb 16 '23 02:02

marsh


In addition to marsh's answer, if you look into the code, you can see that there is a difference in line 577 (others NL occurrences being in (NEWLINE, NL)):

yield TokenInfo(NL if parenlev > 0 else NEWLINE,
       token, spos, epos, line)

where parenlev keeps track of parenthesis' level:

if initial in '([{':
    parenlev += 1
elif initial in ')]}':
    parenlev -= 1

so NEWLINE indicates the end of the "statement", and NL the end of the line, but not the statement.

like image 32
fredtantini Avatar answered Feb 16 '23 04:02

fredtantini