I have learnt from PEP 3131 that non-ASCII identifiers were supported in Python, though it's not considered best practice.
However, I get this strange behaviour, where my 𝜏
identifier (U+1D70F) seems to be automatically converted to τ
(U+03C4).
class Base(object):
def __init__(self):
self.𝜏 = 5 # defined with U+1D70F
a = Base()
print(a.𝜏) # 5 # (U+1D70F)
print(a.τ) # 5 as well # (U+03C4) ? another way to access it?
d = a.__dict__ # {'τ': 5} # (U+03C4) ? seems converted
print(d['τ']) # 5 # (U+03C4) ? consistent with the conversion
print(d['𝜏']) # KeyError: '𝜏' # (U+1D70F) ?! unexpected!
Is that expected behaviour? Why does this silent conversion occur? Does it have anything to see with NFKC normalization? I thought this was only for canonically ordering Unicode character sequences...
Per the documentation on identifiers:
All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.
You can see that U+03C4 is the appropriate result using unicodedata
:
>>> import unicodedata
>>> unicodedata.normalize('NFKC', '𝜏')
'τ'
However, this conversion doesn't apply to string literals, like the one you're using as a dictionary key, hence it's looking for the unconverted character in a dictionary that only contains the converted character.
self.𝜏 = 5 # implicitly converted to "self.τ = 5"
a.𝜏 # implicitly converted to "a.τ"
d['𝜏'] # not converted
You can see similar problems with e.g. string literals used with getattr
:
>>> getattr(a, '𝜏')
Traceback (most recent call last):
File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute '𝜏'
>>> getattr(a, unicodedata.normalize('NFKD', '𝜏'))
5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With