Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a regex as a template with Python

I have the idea to use a regex pattern as a template and wonder if there is a convenient way to do so in Python (3 or newer).

import re

pattern = re.compile("/something/(?P<id>.*)")
pattern.populate(id=1) # that is what I'm looking for

should result in

/something/1
like image 727
deamon Avatar asked Sep 08 '10 22:09

deamon


2 Answers

that's not what regex are for, you could just use normal string formatting.

>>> '/something/{id}'.format(id=1)
'/something/1'
like image 76
SilentGhost Avatar answered Sep 19 '22 19:09

SilentGhost


Below is a a light-weight class I created that does what you're looking for. You can write a single regular expression, and use that expression for both matching strings and generating strings.

There is a small example on the bottom of the code on how to use it.

Generally, you construct a regular expression normally, and use the match and search functions as normal. The format function is used much like string.format to generate a new string.

import re
regex_type = type(re.compile(""))

# This is not perfect. It breaks if there is a parenthesis in the regex.
re_term = re.compile(r"(?<!\\)\(\?P\<(?P<name>[\w_\d]+)\>(?P<regex>[^\)]*)\)")

class BadFormatException(Exception):
    pass

class RegexTemplate(object):
    def __init__(self, r, *args, **kwargs):
        self.r = re.compile(r, *args, **kwargs)
    
    def __repr__(self):
        return "<RegexTemplate '%s'>"%self.r.pattern
    
    def match(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.match(*args, **kwargs)
    
    def search(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.search(*args, **kwargs)
    
    def format(self, **kwargs):
        '''Format this regular expression in a similar way as string.format.
        Only supports true keyword replacement, not group replacement.'''
        pattern = self.r.pattern
        def replace(m):
            name = m.group('name')
            reg = m.group('regex')
            val = kwargs[name]
            if not re.match(reg, val):
                raise BadFormatException("Template variable '%s' has a value "
                    "of %s, does not match regex %s."%(name, val, reg))
            return val
        
        # The regex sub function does most of the work
        value = re_term.sub(replace, pattern)
        
        # Now we have un-escape the special characters. 
        return re.sub(r"\\([.\(\)\[\]])", r"\1", value)

def compile(*args, **kwargs):
    return RegexTemplate(*args, **kwargs)
    
if __name__ == '__main__':
    # Construct a typical URL routing regular expression
    r = RegexTemplate(r"http://example\.com/(?P<year>\d\d\d\d)/(?P<title>\w+)")
    print(r)
    
    # This should match
    print(r.match("http://example.com/2015/article"))
    # Generate the same URL using url formatting.
    print(r.format(year = "2015", title = "article"))
    
    # This should not match
    print(r.match("http://example.com/abcd/article"))
    # This will raise an exception because year is not formatted properly
    try:
        print(r.format(year = "15", title = "article"))
    except BadFormatException as e:
        print(e)
    

There are some limitations:

  • The format function only works with keyword arguments (you can't use the \1 style formatting as in string.format).
  • There is also a bug with matching elements with sub-elements, e.g., RegexTemplate(r'(?P<foo>biz(baz)?)'). This could be corrected with a bit of work.
  • If your regular expression contains character classes outside of a named group, (e.g., [a-z123]) we will not know how to format those.
like image 24
speedplane Avatar answered Sep 20 '22 19:09

speedplane