Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Python forbid two similarly looking Unicode identifiers?

Tags:

python

unicode

I was playing around with Unicode identifiers and stumbled upon this:

>>> 𝑓, x = 1, 2
>>> 𝑓, x
(1, 2)
>>> 𝑓, f = 1, 2
>>> 𝑓, f
(2, 2)

What's going on here? Why does Python replace the object referenced by 𝑓, but only sometimes? Where is that behavior described?

like image 653
Erik Cederstrand Avatar asked Jun 08 '20 06:06

Erik Cederstrand


2 Answers

PEP 3131 -- Supporting Non-ASCII Identifiers says

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

You can use unicodedata to test the conversions:

import unicodedata

unicodedata.normalize('NFKC', '𝑓')
# f

which would indicate that '𝑓' gets converted to 'f' in parsing. Leading to the expected:

𝑓  = "Some String"
print(f)
# "Some String"
like image 102
Mark Avatar answered Nov 06 '22 21:11

Mark


Here's a small example, just to show how horrible this "feature" is:

𝕋𝐑ᡒ𝔰_ο½†π”’π˜’πšπ“Šα΅£β‚‘_𝕀ₕ𝔬𝔲𝖑𝔑_dβ‚‘π•—α΅’π˜―ο½‰π˜΅πšŽβ„“y_π’·π˜¦_𝐚_πš‹α΅˜g = 42
print(Tπ—΅β„Ήπšœ_𝒇eπ–†πšπ™ͺα΅£e_β‚›π”₯º𝓾𝗹𝙙_𝚍eπ’‡α΅’π’β±ο½”α΅‰π•π˜†_𝖻ℯ_π”ž_π–‡π–šπ“°)
# => 42

Try it online! (But please don't use it)

And as mentioned by @MarkMeyer, two identifiers might be distinct even though they look just the same ("CYRILLIC CAPITAL LETTER A" and "LATIN CAPITAL LETTER A")

А = 42
print(A)
# => NameError: name 'A' is not defined
like image 31
Eric Duminil Avatar answered Nov 06 '22 19:11

Eric Duminil