Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

greedy backreference in python's reguar expression?

In my case, I want to capture repeated characters in text; at the same time, at most 3 characters before and behind the repeated patterns should be captured too. For example,

original prefix repeat postfix
1aab 1 aa b
1aaab 1 aaa b
1234aaabcde 234 aaa bcd

I coined a RE string in python:

reobj = re.compile("(?P<prefix>.{0,3})    (?P<repeat>(?P<infix>[a-z])(?P=infix){1,})    (?P<postfix>.{0,3})", re.IGNORECASE | re.VERBOSE | re.DOTALL)

but it gives such a result:

original prefix repeat postfix is desired?
1aab 1 aa b yes
1aaab 1a aa b no
1234aaabcde 234 aaa bcd yes

any help? Thanks.

like image 560
oyster Avatar asked May 04 '26 08:05

oyster


1 Answers

You can use 4 capture groups, where group infix is only for capturing a single char to be repeated.

(?P<prefix>.{0,3}?)(?P<repeat>(?P<infix>[a-z])(?P=infix)+)(?P<postfix>.{0,3})

Regex demo

like image 71
The fourth bird Avatar answered May 06 '26 22:05

The fourth bird



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!