How to check if a string is a valid python identifier? including keyword check?

Tags:

Does anyone know if there is any builtin python method that will check if something is a valid python variable name, INCLUDING a check against reserved keywords? (so, ie, something like 'in' or 'for' would fail...)

Failing that, does anyone know of where I can get a list of reserved keywords (ie, dyanamically, from within python, as opposed to copy-and-pasting something from the online docs)? Or, have another good way of writing your own check?

Surprisingly, testing by wrapping a setattr in try/except doesn't work, as something like this:

setattr(myObj, 'My Sweet Name!', 23)

...actually works! (...and can even be retrieved with getattr!)

493

asked Oct 03 '12 01:10

Paul Molodowitch

1 Answers

Python 3

Python 3 now has 'foo'.isidentifier(), so that seems to be the best solution for recent Python versions (thanks fellow runciter@freenode for suggestion). However, somewhat counter-intuitively, it does not check against the list of keywords, so combination of both must be used:

import keyword  def isidentifier(ident: str) -> bool:     """Determines if string is valid Python identifier."""      if not isinstance(ident, str):         raise TypeError("expected str, but got {!r}".format(type(ident)))      if not ident.isidentifier():         return False      if keyword.iskeyword(ident):         return False      return True

Python 2

For Python 2, easiest possible way to check if given string is valid Python identifier is to let Python parse it itself.

There are two possible approaches. Fastest is to use ast, and check if AST of single expression is of desired shape:

import ast  def isidentifier(ident):     """Determines, if string is valid Python identifier."""      # Smoke test — if it's not string, then it's not identifier, but we don't     # want to just silence exception. It's better to fail fast.     if not isinstance(ident, str):         raise TypeError("expected str, but got {!r}".format(type(ident)))      # Resulting AST of simple identifier is <Module [<Expr <Name "foo">>]>     try:         root = ast.parse(ident)     except SyntaxError:         return False      if not isinstance(root, ast.Module):         return False      if len(root.body) != 1:         return False      if not isinstance(root.body[0], ast.Expr):         return False      if not isinstance(root.body[0].value, ast.Name):         return False      if root.body[0].value.id != ident:         return False      return True

Another is to let tokenize module split the identifier into the stream of tokens, and check it only contains our name:

import keyword import tokenize  def isidentifier(ident):     """Determines if string is valid Python identifier."""      # Smoke test - if it's not string, then it's not identifier, but we don't     # want to just silence exception. It's better to fail fast.     if not isinstance(ident, str):         raise TypeError("expected str, but got {!r}".format(type(ident)))      # Quick test - if string is in keyword list, it's definitely not an ident.     if keyword.iskeyword(ident):         return False      readline = lambda g=(lambda: (yield ident))(): next(g)     tokens = list(tokenize.generate_tokens(readline))      # You should get exactly 2 tokens     if len(tokens) != 2:         return False      # First is NAME, identifier.     if tokens[0][0] != tokenize.NAME:         return False      # Name should span all the string, so there would be no whitespace.     if ident != tokens[0][1]:         return False      # Second is ENDMARKER, ending stream     if tokens[1][0] != tokenize.ENDMARKER:         return False      return True

The same function, but compatible with Python 3, looks like this:

import keyword import tokenize  def isidentifier_py3(ident):     """Determines if string is valid Python identifier."""      # Smoke test — if it's not string, then it's not identifier, but we don't     # want to just silence exception. It's better to fail fast.     if not isinstance(ident, str):         raise TypeError("expected str, but got {!r}".format(type(ident)))      # Quick test — if string is in keyword list, it's definitely not an ident.     if keyword.iskeyword(ident):         return False      readline = lambda g=(lambda: (yield ident.encode('utf-8-sig')))(): next(g)     tokens = list(tokenize.tokenize(readline))      # You should get exactly 3 tokens     if len(tokens) != 3:         return False      # If using Python 3, first one is ENCODING, it's always utf-8 because      # we explicitly passed in UTF-8 BOM with ident.     if tokens[0].type != tokenize.ENCODING:         return False      # Second is NAME, identifier.     if tokens[1].type != tokenize.NAME:         return False      # Name should span all the string, so there would be no whitespace.     if ident != tokens[1].string:         return False      # Third is ENDMARKER, ending stream     if tokens[2].type != tokenize.ENDMARKER:         return False      return True

However, be aware of bugs in Python 3 tokenize implementation that reject some completely valid identifiers like ℘᧚, ﮯ and 贈ᩭ. ast works fine though. Generally, I'd advise against using tokenize-based implemetation for actual checks.

Also, some may consider heavy machinery like AST parser to be a tad overkill. This simple implementation is self-contained and guaranteed to work on any Python 2:

import keyword import string  def isidentifier(ident):     """Determines if string is valid Python identifier."""      if not isinstance(ident, str):         raise TypeError("expected str, but got {!r}".format(type(ident)))      if not ident:         return False      if keyword.iskeyword(ident):         return False      first = '_' + string.lowercase + string.uppercase     if ident[0] not in first:         return False      other = first + string.digits     for ch in ident[1:]:         if ch not in other:             return False      return True

Here are few tests to check these all work:

assert(isidentifier('foo')) assert(isidentifier('foo1_23')) assert(not isidentifier('pass'))    # syntactically correct keyword assert(not isidentifier('foo '))    # trailing whitespace assert(not isidentifier(' foo'))    # leading whitespace assert(not isidentifier('1234'))    # number assert(not isidentifier('1234abc')) # number and letters assert(not isidentifier('👻'))      # Unicode not from allowed range assert(not isidentifier(''))        # empty string assert(not isidentifier('   '))     # whitespace only assert(not isidentifier('foo bar')) # several tokens assert(not isidentifier('no-dashed-names-for-you')) # no such thing in Python  # Unicode identifiers are only allowed in Python 3: assert(isidentifier('℘᧚')) # Unicode $Other_ID_Start and $Other_ID_Continue

Performance

All measurements has been conducted on my machine (MBPr Mid 2014) on the same randomly generated test set of 1 500 000 elements, 1000 000 valid and 500 000 invalid. YMMV

== Python 3: method | calls/sec | faster --------------------------- token  |    48 286 |  1.00x ast    |   175 530 |  3.64x native | 1 924 680 | 39.86x  == Python 2: method | calls/sec | faster --------------------------- token  |    83 994 |  1.00x ast    |   208 206 |  2.48x simple | 1 066 461 | 12.70x

174

answered Oct 03 '22 09:10

toriningen

Related questions
                            
                                RuntimeError: Attempting to deserialize object on a CUDA device
                            
                                AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file
                            
                                Getting a python virtual env error after installing Lion
                            
                                Remove traceback in Python on Ctrl-C
                            
                                Python: Why should I use next() and not obj.next()?
                            
                                sqlite3.Warning: You can only execute one statement at a time
                            
                                psycopg2 leaking memory after large query
                            
                                getting list without k'th element efficiently and non-destructively
                            
                                Creating a dictionary from a CSV file
                            
                                Getting the r-squared value using curve_fit
                            
                                How to unlock a "secured" (read-protected) PDF in Python?
                            
                                Pip for Python 3.8
                            
                                Django form multiple choice
                            
                                If statement for strings in python? [duplicate]
                            
                                Legend not showing up in Matplotlib stacked area plot
                            
                                Difference between dictionary and OrderedDict
                            
                                Cleanest way to obtain the numeric prefix of a string
                            
                                How do I persist to disk a temporary file using Python?
                            
                                random.randint error
                            
                                how to kill process and child processes from python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check if a string is a valid python identifier? including keyword check?

Tags:

python

keyword

identifier

reserved