Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: extracting variables from string templates

I am familiar with the ability to insert variables into a string using Templates, like this:

Template('value is between $min and $max').substitute(min=5, max=10)

What I now want to know is if it is possible to do the reverse. I want to take a string, and extract the values from it using a template, so that I have some data structure (preferably just named variables, but a dict is fine) that contains the extracted values. For example:

>>> string = 'value is between 5 and 10'
>>> d = Backwards_template('value is between $min and $max').extract(string)
>>> print d
{'min': '5', 'max':'10'}

Is this possible?

like image 734
ewok Avatar asked Mar 01 '17 16:03

ewok


People also ask

How extract specific data from string in Python?

Method #1 : Using split() Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.

Does Python support string interpolation?

Python 3.6 added new string interpolation method called literal string interpolation and introduced a new literal prefix f . This new way of formatting strings is powerful and easy to use. It provides access to embedded Python expressions inside string constants.

Does Python have template strings?

Template string is another method used to format strings in Python. In comparison with %operator, . format() and f-strings, it has a (arguably) simpler syntax and functionality.


2 Answers

It's not possible to perfectly reverse the substitution. The problem is that some strings are ambiguous, for example

value is between 5 and 7 and 10

would have two possible solutions: min = "5", max = "7 and 10" and min = "5 and 7", max = "10"

However, you might be able to achieve useful results with regex:

import re

string = 'value is between 5 and 10'
template= 'value is between $min and $max'

pattern= re.escape(template)
pattern= re.sub(r'\\\$(\w+)', r'(?P<\1>.*)', pattern)
match= re.match(pattern, string)
print(match.groupdict()) # output: {'max': '10', 'min': '5'}
like image 154
Aran-Fey Avatar answered Oct 12 '22 21:10

Aran-Fey


That's called regular expressions:

import re
string = 'value is between 5 and 10'
m = re.match(r'value is between (.*) and (.*)', string)
print(m.group(1), m.group(2))

Output:

5 10

Update 1. Names can be given to groups:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
print(m.group('min'), m.group('max'))

But this feature is not used often, as there are usually enough problems with a more important aspect: how to capture exactly what you want (with this particular case that's not a big deal, but even here: what if the string is value is between 1 and 2 and 3 -- should the string be accepted and what's the min and max?).


Update 2. Rather than making a precise regex, it's sometimes easier to combine regular expressions and "regular" code like this:

m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
try:
    value_min = float(m.group('min'))
    value_max = float(m.group('max'))
except (AttributeError, ValueError):  # no match or failed conversion
    value_min = None
    value_max = None

This combined approach is especially worth remembering when your text consists of many chunks (like phrases in quotes of different types) to be processed: in tricky cases, it's harder to define a single regex to handle both delimiters and contents of chunks than to define several steps like text.split(), optional merging of chunks, and independent processing of each chunk (using regexes and other means).

like image 40
Kirill Bulygin Avatar answered Oct 12 '22 20:10

Kirill Bulygin