Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I compare two sentence strings for a similarity in python?

I would first like to say that I am using tweepy. I found a way to filter out the same string but I am having a hard time filtering out similar strings.

I have two sentence strings that I need to compare (Tweepy keyword ="Donald Trump")

String 1: "Trump Administration Dismisses Surgeon General Vivek Murthy (http)PUGheO7BuT5LUEtHDcgm"

String 2: "Trump Administration Dismisses Surgeon General Vivek Murthy (http)avGqdhRVOO"

As you can see they are similar but not the same. I needed to find a way to compare the two and get a number value to decide if the second tweet should be added to the first. I thought I had the solution when I used SequenceMatcher() but it always printed out 0.0. I was expecting it to be greater than 0.5. However Sequence Matcher only seems to work for one word strings (correct me if I am wrong).

Now you are probably thinking, "just splice off the http portions". That won't work either because it does not account for people tweet names like @cars: xyz zyx and @trucks: xyz zyx

Is there some way to compare the two texts? It should be simple but for some reason the solution eludes me. I just learned python a week ago. Still feels weird using indents to discern between what's in a function or not.

like image 932
LuxLunae Avatar asked Dec 07 '22 18:12

LuxLunae


1 Answers

You can use SequenceMatcher().ratio() from difflib, i.e:

from difflib import SequenceMatcher

a = "I love Coding"
b = "I love Codiing"

ratio = SequenceMatcher(None, a, b).ratio()
# 0.9629629629629629

Demo

like image 60
Pedro Lobito Avatar answered Feb 16 '23 05:02

Pedro Lobito