Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to map the differences between two strings?

I came across the following question and was wondering what would be an elegant way to solve it. Let's say we have two strings:

string1 = "I love to eat $(fruit)"
string2 = "I love to eat apples"

The only difference between those strings is $(fruit) and apples. So, I can find the fruit is apples, and a dict{fruit:apples} could be returned.

Another example would be:

string1 = "I have $(food1), $(food2), $(food3) for lunch"
string2 = "I have rice, soup, vegetables for lunch"

I would like to have a dict{food1:rice, food2:soup, food3:vegetables} as the result.

Anyone have a good idea about how to implement it?

Edit:

I think I need the function to be more powerful.

ex.
string1 = "I want to go to $(place)"
string2 = "I want to go to North America"

result: {place : North America}

ex.
string1 = "I won $(index)place in the competition"
string2 = "I won firstplace in the competition"

result: {index : first}

The Rule would be: map the different parts of the string and make them a dict

So I guess all answers using str.split() or trying to split the string will not work. There is no rule that says what characters would be used as a separator in the string.

like image 1000
Billy Avatar asked Sep 27 '18 22:09

Billy


3 Answers

I think this can be cleanly done with regex-based splitting. This should also handle punctuation and other special characters (where a split on space is not enough).

import re

p = re.compile(r'[^\w$()]+')
mapping = {
    x[2:-1]: y for x, y in zip(p.split(string1), p.split(string2)) if x != y}

For your examples, this returns

{'fruit': 'apple'}

and

{'food1': 'rice', 'food2': 'soup', 'food3': 'vegetable'}
like image 70
cs95 Avatar answered Sep 18 '22 04:09

cs95


One solution is to replace $(name) with (?P<name>.*) and use that as a regex:

def make_regex(text):
    replaced = re.sub(r'\$\((\w+)\)', r'(?P<\1>.*)', text)
    return re.compile(replaced)

def find_mappings(mapper, text):
    return make_regex(mapper).match(text).groupdict()

Sample usage:

>>> string1 = "I have $(food1), $(food2), $(food3) for lunch"
>>> string2 = "I have rice, soup, vegetable for lunch"
>>> string3 = "I have rice rice rice, soup, vegetable for lunch"
>>> make_regex(string1).pattern
'I have (?P<food1>.*), (?P<food2>.*), (?P<food3>.*) for lunch'
>>> find_mappings(string1, string2)
{'food1': 'rice', 'food3': 'vegetable', 'food2': 'soup'}
>>> find_mappings(string1, string3)
{'food1': 'rice rice rice', 'food3': 'vegetable', 'food2': 'soup'}

Note that this can handle non alpha numeric tokens (see food1 and rice rice rice). Obviously this will probably do an awful lot of backtracking and might be slow. You can tweak the .* regex to try and make it faster depending on your expectations on "tokens".


For production ready code you'd want to re.escape the parts outside the (?P<name>.*) groups. A bit of pain in the ass to do because you have to "split" that string and call re.escape on each piece, put them together and call re.compile.


Since my answer got accepted I wanted to include a more robust version of the regex:

def make_regex(text):
    regex = ''.join(map(extract_and_escape, re.split(r'\$\(', text)))
    return re.compile(regex)

def extract_and_escape(partial_text):
    m = re.match(r'(\w+)\)', partial_text)
    if m:
        group_name = m.group(1)
        return ('(?P<%s>.*)' % group_name) + re.escape(partial_text[len(group_name)+1:])
    return re.escape(partial_text)

This avoids issues when the text contains special regex characters (e.g. I have $(food1) and it costs $$$. The first solution would end up considering $$$ as three times the $ anchor (which would fail), this robust solution escapes them.

like image 38
Giacomo Alzetta Avatar answered Sep 18 '22 04:09

Giacomo Alzetta


I suppose this does the trick.

s_1 = 'I had $(food_1), $(food_2) and $(food_3) for lunch'
s_2 = 'I had rice, meat and vegetable for lunch'

result = {}
for elem1, elem2 in zip(s_1.split(), s_2.split()):
    if elem1.startswith('$'):
        result[elem1.strip(',')[2:-1]] = elem2
print result
# {'food_3': 'vegetable', 'food_2': 'meat', 'food_1': 'rice,'}
like image 45
IMCoins Avatar answered Sep 19 '22 04:09

IMCoins