Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the names of the named variables from the python string

Is there a graceful way to get names of named %s-like variables of string object? Like this:

string = '%(a)s and %(b)s are friends.'
names = get_names(string)  # ['a', 'b']

Known alternative ways:

  1. Parse names using regular expression, e.g.:

    import re
    names = re.findall(r'%\((\w)\)[sdf]', string)  # ['a', 'b']
    
  2. Use .format()-compatible formating and Formatter().parse(string).

    How to get the variable names from the string for the format() method

But what about a string with %s-like variables?

PS: python 2.7

like image 644
hackprime Avatar asked Jan 19 '16 12:01

hackprime


1 Answers

In order to answer this question, you need to define "graceful". Several factors might be worth considering:

  1. Is the code short, easy to remember, easy to write, and self explanatory?
  2. Does it reuse the underlying logic (i.e. follow the DRY principle)?
  3. Does it implement exactly the same parsing logic?

Unfortunately, the "%" formatting for strings is implemented in the C routine "PyString_Format" in stringobject.c. This routine does not provide an API or hooks that allow access to a parsed form of the format string. It simply builds up the result as it is parsing the format string. Thus any solution will need to duplicate the parsing logic from the C routine. This means DRY is not followed and exposes any solution to breaking if a change is made to the formatting specification.

The parsing algorithm in PyString_Format includes a fair bit of complexity, including handling nested parentheses in key names, so cannot be fully implemented using regular expression nor using string "split()". Short of copying the C code from PyString_Format and converting it to Python code, I do not see any remotely easy way of correctly extracting the names of the mapping keys under all circumstances.

So my conclusion is that there is no "graceful" way to obtain the names of the mapping keys for a Python 2.7 "%" format string.

The following code uses a regular expression to provide a partial solution that covers most common usage:

import re
class StringFormattingParser(object):
    __matcher = re.compile(r'(?<!%)%\(([^)]+)\)[-# +0-9.hlL]*[diouxXeEfFgGcrs]')
    @classmethod
    def getKeyNames(klass, formatString):
        return klass.__matcher.findall(formatString)

# Demonstration of use with some sample format strings
for value in [
    '%(a)s and %(b)s are friends.',
    '%%(nomatch)i',
    '%%',
    'Another %(matched)+4.5f%d%% example',
    '(%(should_match(but does not))s',
    ]:
    print StringFormattingParser.getKeyNames(value)

# Note the following prints out "really does match"!
print '%(should_match(but does not))s' % {'should_match(but does not)': 'really does match'}

P.S. DRY = Don't Repeat Yourself (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)

like image 68
J. Beattie Avatar answered Oct 23 '22 20:10

J. Beattie