Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the substring if the substring has random characters replaced?

Let's say we have a string in Python:

original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"

And we are interested in finding the beginning coordinates of the substring substring ="ChristmasWhen". This is very straightforward in Python, i.e.

>>> substring ="ChristmasWhen"
>>> original_string.find(substring)
18

and this checks out

>>> "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"[18]
'C'

If we tried to look for a string which didn't exist, find() will return -1.

Here is my problem:

I have a substring which is guaranteed to be from the original string. However, characters in this substring have been randomly replaced with another character.

How could I algorithmically find the beginning coordinate of the substring (or at least, check if it's possible) if the substring has random characters '-' replacing certain letters?

Here's a concrete example:

original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
substring = '-hri-t-asW-en'

Naturally, if I try original_string.find('-hri-t-asW-en'), but it would be possible to find hri begins at 19, and therefore with the prefix -, the substring original_string.find('-hri-t-asW-en') must be 18.

like image 407
EB2127 Avatar asked Dec 22 '22 19:12

EB2127


2 Answers

This is typically what regular expressions are for : find patterns. You can then try:

import re                       # use regexp
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
r = re.compile(".hri.t.asW.en") # constructs the search machinery
res = r.search(original_string) # search
print (res.group(0))            # get results

result will be:

ChristmasWhen

Now if your input (the search string) must use '-' as a wildcard you can then translate it to obtain the right regular expression:

import re 
original_string = "TwasTheNightBeforeChristmasWhenAllThroughTheHouse"
s = ".hri.t.asW.en"              # supposedly inputed by user
s = s.replace('-','.')           # translate to regexp syntax
r = re.compile(s)
res = r.search(original_string)
print (res.group(0))
like image 116
Jean-Baptiste Yunès Avatar answered May 21 '23 22:05

Jean-Baptiste Yunès


perhaps use a regular expression? For instance, you can use the . (dot character) to match any character (other than a newline, by default). So if you modify your substring to use dots instead of dashes for the erased letters in the string, you can use re.search to locate those patterns:

text = 'TwasTheNightBeforeChristmasWhenAllThroughTheHouse';
re.search('.hri.t.asW.en', text)
like image 28
IronMan Avatar answered May 21 '23 20:05

IronMan