Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How math operators are identified

How does a simple 2 ++ 2 work behind the scenes in the Python language?

If we type this in Python interpreter:

>>> 2+++--2
4
>>> 2+++*2
  File "<stdin>", line 1
    2++*2
       ^
SyntaxError: invalid syntax

Looking towards the syntax errors here I have noticed that it was the way Python is designed/implemented by the Python designers.

It is said Python is open source code so I started to explore it more. I have read many articles on Python implementation using cpython.

So here the Python compiler easily identifies these ++*%- are operators. Because it in written using the C language. And C uses some direct assembly code compiler which then converts to machine code.

Question 1: How does Python compiler is designed to identify the operators? (regarding lexical and parsing functionality)

Question 2 : How can I modify this simple behavior of Python interpreter where it can throw syntax error for use of multiple operators as same that it does for multiply

>>> 2**2
4
>>> 2***2
  File "<stdin>", line 1
    2***2
       ^
SyntaxError: invalid syntax

I have read these files of cpython :compile.c parser.c,readline.c

But I didn't came across any such files on exceptions handling mechanism for syntax error.

Update :

I am still searching and waiting for any answers for Question-2

like image 366
Shivkumar kondi Avatar asked Feb 02 '17 08:02

Shivkumar kondi


1 Answers

You've tripped over the difference between binary and unary operators. In the briefest of terms, -2 is literally the number "negative two". --2 is "negative (negative two)", or more conventionally "positive two". 2+++--2 is parsed as "two plus positive positive negative negative two", so it boils down to 2+2 and gives you 4. Both +2 and -2 are numbers, but *2 isn't, so that's why your syntax error happens.

Read on if you want horrendous detail, but the first paragraph most directly answers your question.

You asked for detail, so here it comes. Programming languages are (usually...) defined by things called context-free grammars. The grammar of Python is described using Bachus Naur Form. From https://docs.python.org/2/reference/expressions.html#unary-arithmetic-and-bitwise-operations, we have the following definitions:

u_expr ::=  power | "-" u_expr | "+" u_expr | "~" u_expr

m_expr ::=  u_expr | m_expr "*" u_expr | m_expr "//" u_expr | m_expr "/" u_expr
            | m_expr "%" u_expr

a_expr ::=  m_expr | a_expr "+" m_expr | a_expr "-" m_expr

This defines unary expressions, multiplicative expressions and arithmetic expressions in the Python language. I'm going to trim these both down to the bits that are directly relevant to our question before I attempt to explain it:

u_expr ::=  "2" | "-" u_expr | "+" u_expr

m_expr ::=  u_expr | m_expr "*" u_expr

a_expr ::=  m_expr | a_expr "+" m_expr | a_expr "-" m_expr

So, in this grammar, a u_expr is either 2, or it's the literal string + or - followed by any other u_expr, so the following all fit the definition of a u_expr: '2', '-2', '+2', '+-2', '++++---++2'.

An m_expr is either a u_expr, or it's an m_expr followed by a * followed by a u_expr. 2, 2*2, 2*+2, 2*++-+2 all fit this definition.

An a_expr is either an m_expr, or it's an a_expr followed by a plus or minus followed by an m_expr. 2, 2*2, 2+2, 2+2*2, 2++2*-2, and so on.

Now let's start looking at your first syntax error, 2+++*2. We're trying to turn this into an a_expr. It starts with a 2+, so we must be looking for something of the form a_expr "+" m_expr. 2 is an a_expr, we've got our literal +, so for us to not syntax error, we have to somehow turn ++*2 into an m_expr. We can see that every a_expr must start with a "2", however, so parsing now fails.

2+++--2, however, can be parsed as an a_expr. Specifically, 2 is an a_expr, followed by a literal +, followed by ++--2, which is an m_expr.

With regards to your second question about making 2***2 meaningful, I'm afraid that in Python you'd have to redefine what it actually means for a program to be valid Python. Looking at the docs I linked, you can see that every operator is explicitly defined, and for ** we have:

power ::=  primary ["**" u_expr]

Some languages like Haskell have a different idea of what something like 2+2 fundamentally means, and will let you define your own arbitrary operators. In such a language you could define a *** operator, but Python has no such facility without raising a PEP and fundamentally rewriting parts of Python.

If you want more details, then you'll be straying into computer science rather than programming - yes they are different. Get yourself started by looking up topics like Regular Languages, Finite-State Automata, Context-free Languages and the Comsky Hierarchy

like image 122
ymbirtt Avatar answered Oct 23 '22 10:10

ymbirtt