Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Unicode symbols are accepted in Python 3 variable names?

I want to use a larger variety of Unicode symbols for variable names in my Python 3 scripts. What characters are acceptable to use in Python 3 variable names?

I recently started using Unicode symbols (such as Greek and Asian symbols) for code obfuscation.

like image 880
Devyn Collier Johnson Avatar asked Jun 11 '13 12:06

Devyn Collier Johnson


People also ask

Can you use symbols in Python variables?

Usually, the best practice is to assign Symbols to Python variables of the same name, although there are exceptions: Symbol names can contain characters that are not allowed in Python variable names, or may just want to avoid typing long names by assigning Symbols with long names to single letter Python variables.

What are Unicode values in Python?

To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes.

What symbols can be used in a variable name?

The period, the underscore, and the characters $, #, and @ can be used within variable names. For example, A. _$@#1 is a valid variable name.


1 Answers

According to PEP 3131, the first character of an identifier needs to belong to ID_Start, the rest to ID_Continue, defined as follows:

ID_Start is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), the underscore, and characters carrying the Other_ID_Start property. XID_Start then closes this set under normalization, by removing all characters whose NFKC normalization is not of the form ID_Start ID_Continue* anymore.

ID_Continue is defined as all characters in ID_Start, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carryig the Other_ID_Continue property. Again, XID_Continue closes this set under NFKC-normalization; it also adds U+00B7 to support Catalan.

That's a long list (currently around 120.000 characters) - fortunately there is a helpful project on GitHub that contains the list and a script to generate it.

like image 58
Tim Pietzcker Avatar answered Sep 28 '22 03:09

Tim Pietzcker