Suppose I have a string template, e.g.,
string="This is a {object}"
Now i create two(or more) strings by formatting this string, i.e.,
string.format(object="car")
=>"This is a car"
string.format(object="2020-06-05 16:06:30")
=>"This is a 2020-06-05 16:06:30"
Now I have lost the original string somehow. Is there a way to find out the original string using the 2 new strings that I have now?
Note: I have a data set of these strings which were created from a template but the original template was lost because of editing. New strings were created from the new template and put in the same data set. I have tried using some ML based approach but it doesn't seem to work in general case. I am looking for an algorithm that gives me back the original string, it could be one or a group a strings in case the template has been changed multiple times.
A possibility could be to match the words and formatted value options in the input strings and then compare:
import re
def get_vals(s):
return re.findall('[\d\-]+\s[\d:]+|\w+', s)
vals = ["This is a car", "This is a 2020-06-05 16:06:30"]
r = ' '.join('{object}' if len(set(i)) > 1 else i[0] for i in zip(*map(get_vals, vals)))
Output:
'This is a {object}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With