I would first like to say that I am using tweepy. I found a way to filter out the same string but I am having a hard time filtering out similar strings.
I have two sentence strings that I need to compare (Tweepy keyword ="Donald Trump")
String 1: "Trump Administration Dismisses Surgeon General Vivek Murthy (http)PUGheO7BuT5LUEtHDcgm"
String 2: "Trump Administration Dismisses Surgeon General Vivek Murthy (http)avGqdhRVOO"
As you can see they are similar but not the same. I needed to find a way to compare the two and get a number value to decide if the second tweet should be added to the first. I thought I had the solution when I used SequenceMatcher()
but it always printed out 0.0
. I was expecting it to be greater than 0.5
. However Sequence Matcher only seems to work for one word strings (correct me if I am wrong).
Now you are probably thinking, "just splice off the http portions". That won't work either because it does not account for people tweet names like @cars: xyz zyx
and @trucks: xyz zyx
Is there some way to compare the two texts? It should be simple but for some reason the solution eludes me. I just learned python a week ago. Still feels weird using indents to discern between what's in a function or not.
You can use SequenceMatcher().ratio()
from difflib
, i.e:
from difflib import SequenceMatcher
a = "I love Coding"
b = "I love Codiing"
ratio = SequenceMatcher(None, a, b).ratio()
# 0.9629629629629629
Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With