I want to match every substring that begins with w
and ends d
with regex.
For example for input worldworld
it should return
('worldworld', 'world', 'world')
. (note: there are two world
but they are different because they are at different position in the string)
For this purpose I ended with this program with following regex:
import re
s = '''worldworld'''
for g in re.finditer(r'(?=(w.*d))(?=(w.*?d))', s):
print(g.start(1), g.end(1), g[1])
print(g.start(2), g.end(2), g[2])
print('-' * 40)
This prints:
0 10 worldworld
0 5 world
----------------------------------------
5 10 world
5 10 world
----------------------------------------
It finds all substrings, but some are duplicates also (notice the starting and ending position of the group).
I can filter the groups afterwards with group's starting and ending position, but I'm wondering if it can be done with change to my regex, to only return unique groups.
Can I change this regex to only match group that is different from other? If yes how? I'm open to suggestions how to solve this problem.
I don't believe this can be done with a single regexp. But it's straightforward with a nested loop:
import re
test = "wddddd"
# need to compile the tail regexp to get a version of
# `finditer` that allows specifying a start index
tailre = re.compile("(d)")
for wg in re.finditer("(w)", test):
start = wg.start(1)
for dg in tailre.finditer(test, wg.end(1)):
end = dg.end(1)
print(test[start : end], "at", (start, end))
That displays:
wd at (0, 2)
wdd at (0, 3)
wddd at (0, 4)
wdddd at (0, 5)
wddddd at (0, 6)
With
test = "worldworldworld"
instead:
world at (0, 5)
worldworld at (0, 10)
worldworldworld at (0, 15)
world at (5, 10)
worldworld at (5, 15)
world at (10, 15)
One option would be, with the lazy second group, to positive lookahead for .*d
(greedy) afterwards to ensure that if the lazy second group matches, it's not the same as the greedy first group:
(?=(w.*d))(?:(?=(w.*?d)(?=.*d)))?
https://regex101.com/r/UI9ds7/2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With