Let's say we have a string in Python:
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
And we are interested in finding the beginning coordinates of the substring substring ="ChristmasWhen"
. This is very straightforward in Python, i.e.
>>> substring ="ChristmasWhen"
>>> original_string.find(substring)
18
and this checks out
>>> "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"[18]
'C'
If we tried to look for a string which didn't exist, find()
will return -1.
Here is my problem:
I have a substring which is guaranteed to be from the original string. However, characters in this substring have been randomly replaced with another character.
How could I algorithmically find the beginning coordinate of the substring (or at least, check if it's possible) if the substring has random characters '-'
replacing certain letters?
Here's a concrete example:
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
substring = '-hri-t-asW-en'
Naturally, if I try original_string.find('-hri-t-asW-en')
, but it would be possible to find hri
begins at 19, and therefore with the prefix -
, the substring original_string.find('-hri-t-asW-en')
must be 18.
This is typically what regular expressions are for : find patterns. You can then try:
import re # use regexp
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
r = re.compile(".hri.t.asW.en") # constructs the search machinery
res = r.search(original_string) # search
print (res.group(0)) # get results
result will be:
ChristmasWhen
Now if your input (the search string) must use '-' as a wildcard you can then translate it to obtain the right regular expression:
import re
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
s = ".hri.t.asW.en" # supposedly inputed by user
s = s.replace('-','.') # translate to regexp syntax
r = re.compile(s)
res = r.search(original_string)
print (res.group(0))
perhaps use a regular expression? For instance, you can use the .
(dot character) to match any character (other than a newline, by default). So if you modify your substring to use dots instead of dashes for the erased letters in the string, you can use re.search
to locate those patterns:
text = 'TwasTheNightBeforeChristmasWhenAllThroughTheHouse';
re.search('.hri.t.asW.en', text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With