Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to enforce ASCII-only identifiers in Python while allowing UTF-8 strings?

I want to configure Python so that it raises an error when encountering non-ASCII characters in identifiers (e.g., variable names, function names) but still accepts UTF-8 encoded strings (e.g., "Привет, мир!"). For example:

# This should raise an error
def тест(): 
    pass

# This should work
text = "Привет, мир!"

I know about # -*- coding: ascii -*-, but it blocks non-ASCII characters everywhere in the source code, including in string literals.

(and also the same question for jupyter notebook)

like image 905
Филя Усков Avatar asked Oct 23 '25 15:10

Филя Усков


2 Answers

This is easily checked with static code analysis. Pylint will report an issue in its default configuration:

foo.py:2:0: C2401: Function name "тест" contains a non-ASCII character, consider renaming it. (non-ascii-name)

You should configure your VCS to run pylint and only accept commits without warnings; or at least without C2401.

like image 172
Friedrich Avatar answered Oct 25 '25 06:10

Friedrich


While @Friedrich's answer using pylint works for most cases, it has to be noted that pylint is a third-party library that is prone to falling out of sync with each major release of Python. For example, pylint to this date still does not recognize the match statement, which became part of Python's syntax with the release of Python 3.10 back in 2021. You can try running pylint against the code below to find it not warning about a non-ASCII name:

match тест:
    case _:
        pass

And @globglogabgalab's answer using dir() works only for names defined in the module's global namespace and only those that happen to be defined in the current execution path.

An arguably more robust approach would be to take advantage of the convention that all names are parsed into AST as either the id attribute of a ast.Name node or the name attribute of other name-including node types, derived from the base class ast.AST:

import ast

with open(__file__, encoding='utf-8') as source:
    for node in ast.walk(ast.parse(source.read())):
        match node:
            case ast.Name(id=name):
                pass
            case ast.AST(name=name) if name:
                pass
            case _:
                continue
        if not name.isascii():
            raise RuntimeError(f'{name} not an ASCII identifier.')

def тест():
    pass

text = "Привет, мир!"

This produces:

RuntimeError: тест not an ASCII identifier.

Demo here

This approach is more future-proof because it is highly unlikely that Python developers stop following this convention in any future syntax changes.

like image 39
blhsing Avatar answered Oct 25 '25 06:10

blhsing