Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does source encoding apply within string literals?

PEP-263 specifies that encoding specified in the source is applied in the following order:

  1. read the file

  2. decode it into Unicode assuming a fixed per-file encoding

  3. convert it into a UTF-8 byte string

  4. tokenize the UTF-8 content

  5. compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

So, if I take this code:

print 'abcdefgh'
print u'abcdefgh'

And convert it to ROT-13:

# coding: rot13

cevag 'nopqrstu'
cevag h'nopqrstu'

I would expect that it is first decoded and then becomes identical to the original, printing:

abcdefgh
abcdefgh

But instead, it prints:

nopqrstu
abcdefgh

So, the unicode literal works as expeced, but str remains unconverted. Why?


Eliminating some possibilities:

I confirmed that the problem is not in a later phase (printing to console), but immediately at parsing, becuase this code produces "ValueError: unsupported format character 'q' (0x71) at index 1":

x = '%q' % 1  # that is %d !
like image 632
zvone Avatar asked Feb 25 '26 16:02

zvone


1 Answers

I guess the last point actually explains what happens quite accurately:

  1. compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

After the first 4 steps, the contents of the source file are a tokenized unicode version of the following string:

print 'abcdefgh'
print u'abcdefgh'

After that, in step 5, the string object 'abcdefgh' is reencoded into 8-bit string data using the given file encoding (which is rot13), so the contents become:

print 'nopqrstu'
print u'abcdefgh'
like image 54
zvone Avatar answered Feb 28 '26 04:02

zvone



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!