It is very similar to this:
How to tell if a string contains valid Python code
The only difference being instead of the entire program being given altogether, I am interested in a single line of code at a time.
Formally, we say a line of python is "syntactically valid" if there exists any syntactically valid python program that uses that particular line.
For instance, I would like to identify these as syntactically valid lines:
for i in range(10):
x = 1
Because one can use these lines in some syntactically valid python programs.
I would like to identify these lines as syntactically invalid lines:
for j in range(10 in range(10(
x =++-+ 1+-
Because no syntactically correct python programs could ever use these lines
The check does not need to be too strict, it just need to be good enough to filter out obviously bogus statements (like the ones shown above). The line is given as a string, of course.
This uses codeop.compile_command
to attempt to compile the code. This is the same logic that the code
module does to determine whether to ask for another line or immediately fail with a syntax error.
import codeop
def is_valid_code(line):
try:
codeop.compile_command(line)
except SyntaxError:
return False
else:
return True
It can be used as follows:
>>> is_valid_code('for i in range(10):')
True
>>> is_valid_code('')
True
>>> is_valid_code('x = 1')
True
>>> is_valid_code('for j in range(10 in range(10(')
True
>>> is_valid_code('x = ++-+ 1+-')
False
I'm sure at this point, you're saying "what gives? for j in range(10 in range(10(
was supposed to be invalid!" The problem with this line is that 10()
is technically syntactically valid, at least according to the Python interpreter. In the REPL, you get this:
>>> 10()
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
10()
TypeError: 'int' object is not callable
Notice how this is a TypeError
, not a SyntaxError
. ast.parse
says it is valid as well, and just treats it as a call with the function being an ast.Num
.
These kinds of things can't easily be caught until they actually run.
If some kind of monster managed to modify the value of the cached 10
value (which would technically be possible), you might be able to do 10()
. It's still allowed by the syntax.
What about the unbalanced parentheses? This fits the same bill as for i in range(10):
. This line is invalid on its own, but may be the first line in a multi-line expression. For example, see the following:
>>> is_valid_code('if x ==')
False
>>> is_valid_code('if (x ==')
True
The second line is True
because the expression could continue like this:
if (x ==
3):
print('x is 3!')
and the expression would be complete. In fact, codeop.compile_command
distinguishes between these different situations by returning a code object if it's a valid self-contained line, None
if the line is expected to continue for a full expression, and throwing a SyntaxError
on an invalid line.
However, you can also get into a much more complicated problem than initially stated. For example, consider the line )
. If it's the start of the module, or the previous line is {
, then it's invalid. However, if the previous line is (1,2,
, it's completely valid.
The solution given here will work if you only work forward, and append previous lines as context, which is what the code module does for an interactive session. Creating something that can always accurately identify whether a single line could possibly exist in a Python file without considering surrounding lines is going to be extremely difficult, as the Python grammar interacts with newlines in non-trivial ways. This answer responds with whether a given line could be at the beginning of a module and continue on to the next line without failing.
It would be better to identify what the purpose of recognizing single lines is and solve that problem in a different way than trying to solve this for every case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With