Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python modifies unicode identifiers?

Python 3.8 supports using a limited set of non-ASCII Unicode characters in identifiers. So, it seems that it is valid to use 𝚺 as a character in an identifier.

However, something is wrong...

Problem

def f(𝚺):
    print(f'{𝚺=}')

f(1)
f(𝚺=2)
f(**{'𝚺': 3})

The first two calls are fine, but the third fails:

𝚺=1
𝚺=2
Traceback (most recent call last):
  File "sigma.py", line 24, in <module>
    f(**{'𝚺': 3})
TypeError: f() got an unexpected keyword argument '𝚺'

Analysis

Let's see what is actually going on:

def f2(**kw):
    for name, value in kw.items():
        print(f'{name}={value}     {ord(name)=}')
f2(𝚺=2)
f2(**{'𝚺': 3})

It prints:

Σ=2     ord(name)=931
𝚺=3     ord(name)=120506

I called it with 𝚺 both times, but it was changed to the very similar simpler Σ in the first call.

It seems that an argument named 𝚺 (U+1D6BA) is implicitly renamed to Σ (U+03A3), and in every call to the function, argument 𝚺 is also implicitly renamed to Σ, except if it is passed as **kwargs.

The Questions

Is this a bug? It does not look like it is accidental. Is it documented? Is there a set of true characters and a list of alias characters available somewhere?

like image 848
zvone Avatar asked Mar 21 '20 17:03

zvone


1 Answers

I think this happens because of the way Python handles characters.
If you set a variable using one of your provided sigma letters: Σ or 𝚺, you can also access it with the other one. Knowing that both these snippets work:

>>> Σ = 5
>>> 𝚺
5
>>> 𝚺 = 5
>>> Σ
5

You can see in globals() it is assigned to Σ (ord: 931)
My guess is Python modifies the character before performing a variable lookup.
Similar discussion here, posted by me in github/wtfpython

like image 172
musava_ribica Avatar answered Sep 20 '22 11:09

musava_ribica