In my case, I want to capture repeated characters in text; at the same time, at most 3 characters before and behind the repeated patterns should be captured too. For example,
| original | prefix | repeat | postfix |
|---|---|---|---|
| 1aab | 1 | aa | b |
| 1aaab | 1 | aaa | b |
| 1234aaabcde | 234 | aaa | bcd |
I coined a RE string in python:
reobj = re.compile("(?P<prefix>.{0,3}) (?P<repeat>(?P<infix>[a-z])(?P=infix){1,}) (?P<postfix>.{0,3})", re.IGNORECASE | re.VERBOSE | re.DOTALL)
but it gives such a result:
| original | prefix | repeat | postfix | is desired? |
|---|---|---|---|---|
| 1aab | 1 | aa | b | yes |
| 1aaab | 1a | aa | b | no |
| 1234aaabcde | 234 | aaa | bcd | yes |
any help? Thanks.
You can use 4 capture groups, where group infix is only for capturing a single char to be repeated.
(?P<prefix>.{0,3}?)(?P<repeat>(?P<infix>[a-z])(?P=infix)+)(?P<postfix>.{0,3})
Regex demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With