I'm trying to do a comparison of strings in Python. My strings contain titles which can be structured a number of different ways:
'Title'
'Title: Subtitle'
'Title - Subtitle'
'Title, Subtitle'
'Title Subtitle'
Is it possible to do similarity comparison in Python so that it can determine that match('Title: Subtitle', 'Title - Subtitle') = True?  (or however it would be constructed)
Basically I'm trying to determine if they're the same title even if the splitting is different.
if 'Title: Subtitle' == 'Title - Subtitle':
    match = 'True'
else:
    match = 'False'
There are also some that might be stored as The Title: The Subtitle or Title, The: Subtitle, The although I think that may add a bit of complexity I could probably get around by reconstructing the string.
What you're trying to do has already been implemented very well in the jellyfish package.
>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
                        The standard library's difflib module provides a function get_close_matches which does fuzzy string matching.
>>> import difflib
>>> difflib.get_close_matches('python', ['snakes', 'thon.py', 'pythin'])
['pythin', 'thon.py']  # ordered by similarity score
                        You can use in keyword. It isn't a similarity comparison, but does what you want:
s = "Title: Subtitle"
if "Title" in s or "Subtitle" in s:
    match = 'True'
else:
    match = 'False'
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With