<p>When running the following line:</p> <pre class="prettyprint"><code>>>> [0xfor x in (1, 2, 3)] </code></pre> <p>I expected Python to return an error.</p> <p>Instead, the REPL returns:</p> <blockquote> <p><code>[15]</code></p> </blockquote> <p>What can possibly be the reason?</p>

<h3>TL;DR</h3> <p>Python reads the expression as <code>[0xf or (x in (1, 2, 3))]</code>, because:</p> <ol> <li>The Python tokenizer.</li> <li> Operator precedence.</li> </ol> <p>It never raises <code>NameError</code> thanks to short-circuit evaluation - if the expression left to the <code>or</code> operator is a truthy value, Python will never try to evaluate the right side of it.</p> <h3>Parsing hexadecimal numbers</h3> <p>First, we have to understand how Python reads hexadecimal numbers.</p> <p>On tokenizer.c's huge <code>tok_get</code> function, we:</p> <ol> <li> Find the first <code>0x</code>.</li> <li> Keep reading the next characters as long as they're in the range of 0-f.</li> </ol> <p>The parsed token, <code>0xf</code> (as "o" is not in the range of 0-f), will eventually get passed to the PEG parser, which will convert it to the decimal value <code>15</code> (see Appendix A).</p> <p>We still have to parse the rest of the code, <code>or x in (1, 2, 3)]</code>, which leaves as with the following code:</p> <pre class="prettyprint"><code>[15 or x in (1, 2, 3)] </code></pre> <h3>Operator precedence</h3> <p>Because <code>in</code> have higher operator precedence than <code>or</code>, we might expect <code>x in (1, 2, 3)</code> to evaluate first.</p> <p>That is troublesome situation, as <code>x</code> doesn't exist and will raise a <code>NameError</code>.</p> <h3> <code>or</code> is lazy</h3> <p>Fortunately, Python supports Short-circuit evaluation as <code>or</code> is a lazy operator: if the left operand is equivalent to <code>True</code>, Python won't bother evaluating the right operand.</p> <p>We can see it using the <code>ast</code> module:</p> <pre class="prettyprint lang-py prettyprint-override"><code>parsed = ast.parse('0xfor x in (1, 2, 3)', mode='eval') ast.dump(parsed) </code></pre> <p>Output:</p> <pre class="prettyprint lang-py prettyprint-override"><code> Expression( body=BoolOp( op=Or(), values=[ Constant(value=15), # <-- Truthy value, so the next operand won't be evaluated. Compare( left=Name(id='x', ctx=Load()), ops=[In()], comparators=[ Tuple(elts=[Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load()) ] ) ] ) ) </code></pre> <p>So the final expression is equal to <code>[15]</code>.</p> <hr> <h3>Appendix A: The PEG parser</h3> <p>On pegen.c's <code>parsenumber_raw</code> function, we can find how Python treats leading zeros:</p> <pre class="prettyprint lang-c prettyprint-override"><code> if (s[0] == '0') { x = (long)PyOS_strtoul(s, (char **)&end, 0); if (x < 0 && errno == 0) { return PyLong_FromString(s, (char **)0, 0); } } </code></pre> <p><code>PyOS_strtoul</code> is in <code>Python/mystrtoul.c</code>.</p> <p>Inside mystrtoul.c, the parser looks at one character after the <code>0x</code>. If it's an hexadecimal character, Python sets the base of the number to be 16:</p> <pre class="prettyprint lang-c prettyprint-override"><code> if (*str == 'x' || *str == 'X') { /* there must be at least one digit after 0x */ if (_PyLong_DigitValue[Py_CHARMASK(str[1])] >= 16) { if (ptr) *ptr = (char *)str; return 0; } ++str; base = 16; } ... </code></pre> <p>Then it parses the rest of the number as long as the characters are in the range of 0-f:</p> <pre class="prettyprint lang-c prettyprint-override"><code> while ((c = _PyLong_DigitValue[Py_CHARMASK(*str)]) < base) { if (ovlimit > 0) /* no overflow check required */ result = result * base + c; ... ++str; --ovlimit; } </code></pre> <p>Eventually, it sets the pointer to point the last character that was scanned - which is one character past the last hexadecimal character:</p> <pre class="prettyprint"><code> if (ptr) *ptr = (char *)str; </code></pre> <hr> <h3>Thanks</h3> <ul> <li> CSI_Tech_Dept from reddit for referring me to the correct section in the tokenizer.c file.</li> <li> The original Tweet.</li> </ul>

Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]

Tags:

python

python-3.x

operator-precedence

short-circuiting

When running the following line:

>>> [0xfor x in (1, 2, 3)]

I expected Python to return an error.

Instead, the REPL returns:

[15]

What can possibly be the reason?

217

asked Apr 13 '21 22:04

Yam Mesicka

1 Answers

TL;DR

Python reads the expression as [0xf or (x in (1, 2, 3))], because:

The Python tokenizer.
Operator precedence.

It never raises NameError thanks to short-circuit evaluation - if the expression left to the or operator is a truthy value, Python will never try to evaluate the right side of it.

Parsing hexadecimal numbers

First, we have to understand how Python reads hexadecimal numbers.

On tokenizer.c's huge tok_get function, we:

Find the first 0x.
Keep reading the next characters as long as they're in the range of 0-f.

The parsed token, 0xf (as "o" is not in the range of 0-f), will eventually get passed to the PEG parser, which will convert it to the decimal value 15 (see Appendix A).

We still have to parse the rest of the code, or x in (1, 2, 3)], which leaves as with the following code:

[15 or x in (1, 2, 3)]

Operator precedence

Because in have higher operator precedence than or, we might expect x in (1, 2, 3) to evaluate first.

That is troublesome situation, as x doesn't exist and will raise a NameError.

`or` is lazy

Fortunately, Python supports Short-circuit evaluation as or is a lazy operator: if the left operand is equivalent to True, Python won't bother evaluating the right operand.

We can see it using the ast module:

parsed = ast.parse('0xfor x in (1, 2, 3)', mode='eval') ast.dump(parsed)

Output:

     Expression(         body=BoolOp(             op=Or(),             values=[                 Constant(value=15),   # <-- Truthy value, so the next operand won't be evaluated.                 Compare(                     left=Name(id='x', ctx=Load()),                     ops=[In()],                     comparators=[                         Tuple(elts=[Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())                     ]                 )             ]         )     )

So the final expression is equal to [15].

Appendix A: The PEG parser

On pegen.c's parsenumber_raw function, we can find how Python treats leading zeros:

    if (s[0] == '0') {         x = (long)PyOS_strtoul(s, (char **)&end, 0);         if (x < 0 && errno == 0) {             return PyLong_FromString(s, (char **)0, 0);         }     }

PyOS_strtoul is in Python/mystrtoul.c.

Inside mystrtoul.c, the parser looks at one character after the 0x. If it's an hexadecimal character, Python sets the base of the number to be 16:

            if (*str == 'x' || *str == 'X') {                 /* there must be at least one digit after 0x */                 if (_PyLong_DigitValue[Py_CHARMASK(str[1])] >= 16) {                     if (ptr)                         *ptr = (char *)str;                     return 0;                 }                 ++str;                 base = 16;             } ...

Then it parses the rest of the number as long as the characters are in the range of 0-f:

    while ((c = _PyLong_DigitValue[Py_CHARMASK(*str)]) < base) {         if (ovlimit > 0) /* no overflow check required */             result = result * base + c;         ...         ++str;         --ovlimit;     }

Eventually, it sets the pointer to point the last character that was scanned - which is one character past the last hexadecimal character:

    if (ptr)         *ptr = (char *)str;

Thanks

CSI_Tech_Dept from reddit for referring me to the correct section in the tokenizer.c file.
The original Tweet.

156

answered Sep 22 '22 08:09

Yam Mesicka

Related questions
                            
                                Handling urllib2's timeout? - Python
                            
                                Get available modules
                            
                                How do I get the name of the rows from the index of a data frame?
                            
                                Get path of virtual environment in pipenv
                            
                                Python class @property: use setter but evade getter?
                            
                                Is it possible to compile a program written in Python? [closed]
                            
                                Django: How should I store a money value?
                            
                                Matplotlib subplots_adjust hspace so titles and xlabels don't overlap?
                            
                                Django Deprecation Warning or ImproperlyConfigured error - Passing a 3-tuple to django.conf.urls.include() is not supported
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                How to write Python code that is able to properly require a minimal python version?
                            
                                how to get the last part of a string before a certain character?
                            
                                Python sharing a lock between processes
                            
                                Is `id` a keyword in python?
                            
                                Python configuration file: Any file format recommendation? INI format still appropriate? Seems quite old school
                            
                                List Directories and get the name of the Directory
                            
                                subsetting a Python DataFrame
                            
                                Selenium Finding elements by class name in python
                            
                                Merge multiple column values into one column in python pandas
                            
                                Check if a file is not open nor being used by another process

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]

Tags:

python

python-3.x

operator-precedence

short-circuiting

Yam Mesicka

People also ask

1 Answers

TL;DR

Parsing hexadecimal numbers

Operator precedence

`or` is lazy

Appendix A: The PEG parser

Thanks

Yam Mesicka

Recent Activity

Donate For Us

Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]

Tags:

python

python-3.x

operator-precedence

short-circuiting

Yam Mesicka

People also ask

1 Answers

TL;DR

Parsing hexadecimal numbers

Operator precedence

or is lazy

Appendix A: The PEG parser

Thanks

Yam Mesicka

Related questions

Recent Activity

Donate For Us

`or` is lazy