Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all occurrences of a substring (including overlap)?

Okay, so I found this: How to find all occurrences of a substring?

Which says, to get the indices overlapping occurances of substrings in a list, you can use:

[m.start() for m in re.finditer('(?=SUBSTRING)', 'STRING')]

Which works, but my problem is that both the string and the substring to look for are defined by variables. I don't know enough about regular expressions to know how to deal with it - I can get it to work with non-overlapping substrings, that's just:

[m.start() for m in re.finditer(p3, p1)]

Edit:

Because someone asked, I'll go ahead and specfify. p1 and p3 could be any string, but if they were, for example p3 = "tryt" and p1 = "trytryt", the result should be [0, 3].

like image 305
Kevin Avatar asked Jan 12 '23 08:01

Kevin


2 Answers

The arguments to re.finditer are simple strings. If you have the substring in a variable simply format it into the regular expression. Something like '(?={0})'.format(p3) is a start. Since various symbols do have special meaning in a RE you will want to escape them. Luckily the re module includes re.escape for just such a need.

[m.start() for m in re.finditer('(?={0})'.format(re.escape(p3)), p1)]
like image 104
D.Shawley Avatar answered Jan 23 '23 03:01

D.Shawley


Regex might be overkill here:

>>> word = 'tryt'
>>> text = 'trytryt'
>>> [i for i, _ in enumerate(text) if text.startswith(word, i)]
[0, 3]
like image 25
Eric Avatar answered Jan 23 '23 03:01

Eric