I'm trying to do a comparison of strings in Python. My strings contain titles which can be structured a number of different ways:
'Title'
'Title: Subtitle'
'Title - Subtitle'
'Title, Subtitle'
'Title Subtitle'
Is it possible to do similarity comparison in Python so that it can determine that match('Title: Subtitle', 'Title - Subtitle') = True
? (or however it would be constructed)
Basically I'm trying to determine if they're the same title even if the splitting is different.
if 'Title: Subtitle' == 'Title - Subtitle':
match = 'True'
else:
match = 'False'
There are also some that might be stored as The Title: The Subtitle
or Title, The: Subtitle, The
although I think that may add a bit of complexity I could probably get around by reconstructing the string.
What you're trying to do has already been implemented very well in the jellyfish package.
>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
The standard library's difflib
module provides a function get_close_matches
which does fuzzy string matching.
>>> import difflib
>>> difflib.get_close_matches('python', ['snakes', 'thon.py', 'pythin'])
['pythin', 'thon.py'] # ordered by similarity score
You can use in
keyword. It isn't a similarity comparison, but does what you want:
s = "Title: Subtitle"
if "Title" in s or "Subtitle" in s:
match = 'True'
else:
match = 'False'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With