Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]

When running the following line:

>>> [0xfor x in (1, 2, 3)] 

I expected Python to return an error.

Instead, the REPL returns:

[15]

What can possibly be the reason?

like image 217
Yam Mesicka Avatar asked Apr 13 '21 22:04

Yam Mesicka


People also ask

What can be returned from a python function?

Since everything in Python is an object, you can return strings, lists, tuples, dictionaries, functions, classes, instances, user-defined objects, and even modules or packages. For example, say you need to write a function that takes a list of integers and returns a list containing only the even numbers in the original list.

Why do we return a function when we factor in Python?

The function object you return is a closure that retains information about the state of factor. In other words, it remembers the value of factor between calls. That’s why double remembers that factor was equal to 2 and triple remembers that factor was equal to 3.

Can Python functions have more than one return statement?

Python functions are not restricted to having a single return statement. If a given function has more than one return statement, then the first one encountered will determine the end of the function’s execution and also its return value.

What happens if you don’t include a return statement in Python?

So, if you don’t explicitly use a return value in a return statement, or if you totally omit the return statement, then Python will implicitly return a default value for you. That default return value will always be None. Say you’re writing a function that adds 1 to a number x, but you forget to supply a return statement.


1 Answers

TL;DR

Python reads the expression as [0xf or (x in (1, 2, 3))], because:

  1. The Python tokenizer.
  2. Operator precedence.

It never raises NameError thanks to short-circuit evaluation - if the expression left to the or operator is a truthy value, Python will never try to evaluate the right side of it.

Parsing hexadecimal numbers

First, we have to understand how Python reads hexadecimal numbers.

On tokenizer.c's huge tok_get function, we:

  1. Find the first 0x.
  2. Keep reading the next characters as long as they're in the range of 0-f.

The parsed token, 0xf (as "o" is not in the range of 0-f), will eventually get passed to the PEG parser, which will convert it to the decimal value 15 (see Appendix A).

We still have to parse the rest of the code, or x in (1, 2, 3)], which leaves as with the following code:

[15 or x in (1, 2, 3)] 

Operator precedence

Because in have higher operator precedence than or, we might expect x in (1, 2, 3) to evaluate first.

That is troublesome situation, as x doesn't exist and will raise a NameError.

or is lazy

Fortunately, Python supports Short-circuit evaluation as or is a lazy operator: if the left operand is equivalent to True, Python won't bother evaluating the right operand.

We can see it using the ast module:

parsed = ast.parse('0xfor x in (1, 2, 3)', mode='eval') ast.dump(parsed) 

Output:

     Expression(         body=BoolOp(             op=Or(),             values=[                 Constant(value=15),   # <-- Truthy value, so the next operand won't be evaluated.                 Compare(                     left=Name(id='x', ctx=Load()),                     ops=[In()],                     comparators=[                         Tuple(elts=[Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())                     ]                 )             ]         )     )  

So the final expression is equal to [15].


Appendix A: The PEG parser

On pegen.c's parsenumber_raw function, we can find how Python treats leading zeros:

    if (s[0] == '0') {         x = (long)PyOS_strtoul(s, (char **)&end, 0);         if (x < 0 && errno == 0) {             return PyLong_FromString(s, (char **)0, 0);         }     } 

PyOS_strtoul is in Python/mystrtoul.c.

Inside mystrtoul.c, the parser looks at one character after the 0x. If it's an hexadecimal character, Python sets the base of the number to be 16:

            if (*str == 'x' || *str == 'X') {                 /* there must be at least one digit after 0x */                 if (_PyLong_DigitValue[Py_CHARMASK(str[1])] >= 16) {                     if (ptr)                         *ptr = (char *)str;                     return 0;                 }                 ++str;                 base = 16;             } ... 

Then it parses the rest of the number as long as the characters are in the range of 0-f:

    while ((c = _PyLong_DigitValue[Py_CHARMASK(*str)]) < base) {         if (ovlimit > 0) /* no overflow check required */             result = result * base + c;         ...         ++str;         --ovlimit;     } 

Eventually, it sets the pointer to point the last character that was scanned - which is one character past the last hexadecimal character:

    if (ptr)         *ptr = (char *)str; 

Thanks

  • CSI_Tech_Dept from reddit for referring me to the correct section in the tokenizer.c file.
  • The original Tweet.
like image 156
Yam Mesicka Avatar answered Sep 22 '22 08:09

Yam Mesicka