python: extracting variables from string templates

Tags:

string-formatting

I am familiar with the ability to insert variables into a string using Templates, like this:

Template('value is between $min and $max').substitute(min=5, max=10)

What I now want to know is if it is possible to do the reverse. I want to take a string, and extract the values from it using a template, so that I have some data structure (preferably just named variables, but a dict is fine) that contains the extracted values. For example:

>>> string = 'value is between 5 and 10'
>>> d = Backwards_template('value is between $min and $max').extract(string)
>>> print d
{'min': '5', 'max':'10'}

Is this possible?

734

asked Mar 01 '17 16:03

ewok

2 Answers

It's not possible to perfectly reverse the substitution. The problem is that some strings are ambiguous, for example

value is between 5 and 7 and 10

would have two possible solutions: min = "5", max = "7 and 10" and min = "5 and 7", max = "10"

However, you might be able to achieve useful results with regex:

import re

string = 'value is between 5 and 10'
template= 'value is between $min and $max'

pattern= re.escape(template)
pattern= re.sub(r'\\\$(\w+)', r'(?P<\1>.*)', pattern)
match= re.match(pattern, string)
print(match.groupdict()) # output: {'max': '10', 'min': '5'}

154

answered Oct 12 '22 21:10

Aran-Fey

That's called regular expressions:

import re
string = 'value is between 5 and 10'
m = re.match(r'value is between (.*) and (.*)', string)
print(m.group(1), m.group(2))

Output:

5 10

Update 1. Names can be given to groups:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
print(m.group('min'), m.group('max'))

But this feature is not used often, as there are usually enough problems with a more important aspect: how to capture exactly what you want (with this particular case that's not a big deal, but even here: what if the string is value is between 1 and 2 and 3 -- should the string be accepted and what's the min and max?).

Update 2. Rather than making a precise regex, it's sometimes easier to combine regular expressions and "regular" code like this:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
try:
    value_min = float(m.group('min'))
    value_max = float(m.group('max'))
except (AttributeError, ValueError):  # no match or failed conversion
    value_min = None
    value_max = None

This combined approach is especially worth remembering when your text consists of many chunks (like phrases in quotes of different types) to be processed: in tricky cases, it's harder to define a single regex to handle both delimiters and contents of chunks than to define several steps like text.split(), optional merging of chunks, and independent processing of each chunk (using regexes and other means).

answered Oct 12 '22 20:10

Kirill Bulygin

Related questions
                            
                                Scrapy delay request
                            
                                Activating pyvenv from gitbash for windows
                            
                                new thread blocks main thread
                            
                                Does the number of imported modules in python effect memory and performance?
                            
                                List All Wireless Networks Python for PC
                            
                                How to write Python Array into Excel Spread sheet
                            
                                How to get rows from DF that contain value None in pyspark (spark)
                            
                                Converting numpy arrays of arrays into one whole numpy array
                            
                                Registering route on blueprint raises AttributeError: 'function' object has no attribute 'route'
                            
                                Using groupby group names in function
                            
                                Getting File Metadata from Google API V3 in Python
                            
                                returncode of Popen object is None after the process is terminated
                            
                                Interpolating a numpy array to fit another array
                            
                                Convert HTML into CSV
                            
                                Using unittest to test argparse - exit errors
                            
                                asyncio server and client to handle input from console
                            
                                Easy way to add thousand separator to numbers in Python pandas DataFrame
                            
                                Python Error 104, connection reset by peer
                            
                                How do I calculate PDF (probability density function) in Python?
                            
                                Deleting User Messages in Discord.py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With