Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Regular Expressions (or another python module) to compare text/characters?

Let's say that my program receives an input such as a string of characters that has any type of character. For example, 'Bob's Bagel Shop'. Then it gets another string that says 'Fred's Bagel Store'. How can I use regular expressions or some other module in python to compare these and have my program tell me if at least 5 (or any number I want) of the characters are the same anywhere in the string, but all in the same order, such as the word 'Bagel'?

Thanks.

like image 271
Marcus Johnson Avatar asked Aug 12 '12 18:08

Marcus Johnson


2 Answers

There's a Python standard library class difflib.SequenceMatcher that will help to solve your problem. Here's a code sample:

from difflib import SequenceMatcher

s1 = "Bob's Bagel Shop"
s2 = "Bill's Bagel Shop"

matcher = SequenceMatcher(a=s1, b=s2)
match = matcher.find_longest_match(0, len(s1), 0, len(s2))

Result:

Match(a=3, b=4, size=13)  # value that 'match' variable holds

The result shows that both string has equal substring with 13 characters length (starting from 3-rd char in first string and 4-th char in second string).

You can use this match result object to get its fields as values:

match.size  # 13
match.a     # 3
match.b     # 4
like image 138
Rostyslav Dzinko Avatar answered Nov 14 '22 23:11

Rostyslav Dzinko


you can use itetools.combinations and then use intersection of sets to find out matching characters from both strings:

from itertools import combinations
str1="Bob's Bagel Shop"
str2="Fred's Bagel Store"

def combi(strs):
    chars=''.join(strs.split())
    lis=[]
    for x in range(1,len(chars)):
        for y in combinations(chars,x):
            if ''.join(y) in chars:
                lis.append(''.join(y))
    return lis           


lis1=combi(str1)
lis2=combi(str2)
print max(set(lis1).intersection(set(lis2)),key=len)  

output:

'sBagelS
like image 44
Ashwini Chaudhary Avatar answered Nov 14 '22 23:11

Ashwini Chaudhary