My goal here is to create a very simple template language. At the moment, I'm working on replacing a variable with a value, like this:
This input:
The Web
Should produce this output:
The Web This Is A Test Variable
I've got it working. But looking at my code, I'm running multiple identical regexes on the same strings -- that just offends my sense of efficiency. There's got to be a better, more Pythonic way. (It's the two "while" loops that really offend.)
This does pass the unit tests, so if this is silly premature optimization, tell me -- I'm willing to let this go. There may be dozens of these variable definitions and uses in a document, but not hundreds. But I suspect there's obvious (to other people) ways of improving this, and I'm curious what the StackOverflow crowd will come up with.
def stripMatchedQuotes(item):
MatchedSingleQuotes = re.compile(r"'(.*)'", re.LOCALE)
MatchedDoubleQuotes = re.compile(r'"(.*)"', re.LOCALE)
item = MatchedSingleQuotes.sub(r'\1', item, 1)
item = MatchedDoubleQuotes.sub(r'\1', item, 1)
return item
def processVariables(item):
VariableDefinition = re.compile(r'<%(.*?)=(.*?)%>', re.LOCALE)
VariableUse = re.compile(r'<%(.*?)%>', re.LOCALE)
Variables={}
while VariableDefinition.search(item):
VarName, VarDef = VariableDefinition.search(item).groups()
VarName = stripMatchedQuotes(VarName).upper().strip()
VarDef = stripMatchedQuotes(VarDef.strip())
Variables[VarName] = VarDef
item = VariableDefinition.sub('', item, 1)
while VariableUse.search(item):
VarName = stripMatchedQuotes(VariableUse.search(item).group(1).upper()).strip()
item = VariableUse.sub(Variables[VarName], item, 1)
return item
The first thing that may improve things is to move the re.compile outside the function. The compilation is cached, but there is a speed hit in checking this to see if its compiled.
Another possibility is to use a single regex as below:
MatchedQuotes = re.compile(r"(['\"])(.*)\1", re.LOCALE)
item = MatchedQuotes.sub(r'\2', item, 1)
Finally, you can combine this into the regex in processVariables. Taking Torsten Marek's suggestion to use a function for re.sub, this improves and simplifies things dramatically.
VariableDefinition = re.compile(r'<%(["\']?)(.*?)\1=(["\']?)(.*?)\3%>', re.LOCALE)
VarRepl = re.compile(r'<%(["\']?)(.*?)\1%>', re.LOCALE)
def processVariables(item):
vars = {}
def findVars(m):
vars[m.group(2).upper()] = m.group(4)
return ""
item = VariableDefinition.sub(findVars, item)
return VarRepl.sub(lambda m: vars[m.group(2).upper()], item)
print processVariables('<%"TITLE"="This Is A Test Variable"%>The Web <%"TITLE"%>')
Here are my timings for 100000 runs:
Original : 13.637
Global regexes : 12.771
Single regex : 9.095
Final version : 1.846
[Edit] Add missing non-greedy specifier
[Edit2] Added .upper() calls so case insensitive like original version
sub
can take a callable as it's argument rather than a simple string. Using that, you can replace all variables with one function call:
>>> import re
>>> var_matcher = re.compile(r'<%(.*?)%>', re.LOCALE)
>>> string = '<%"TITLE"%> <%"SHMITLE"%>'
>>> values = {'"TITLE"': "I am a title.", '"SHMITLE"': "And I am a shmitle."}
>>> var_matcher.sub(lambda m: vars[m.group(1)], string)
'I am a title. And I am a shmitle.
Follow eduffy.myopenid.com's advice and keep the compiled regexes around.
The same recipe can be applied to the first loop, only there you need to store the value of the variable first, and always return ""
as replacement.
Never create your own programming language. Ever. (I used to have an exception to this rule, but not any more.)
There is always an existing language you can use which suits your needs better. If you elaborated on your use-case, people may help you select a suitable language.
Creating a templating language is all well and good, but shouldn't one of the goals of the templating language be easy readability and efficient parsing? The example you gave seems to be neither.
As Jamie Zawinsky famously said:
Some people, when confronted with a problem, think "I know, I'll use regular expressions!" Now they have two problems.
If regular expressions are a solution to a problem you have created, the best bet is not to write a better regular expression, but to redesign your approach to eliminate their use entirely. Regular expressions are complicated, expensive, hugely difficult to maintain, and (ideally) should only be used for working around a problem someone else created.
You can match both kind of quotes in one go with r"(\"|')(.*?)\1"
- the \1
refers to the first group, so it will only match matching quotes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With