greedy backreference in python's reguar expression?

Question

In my case, I want to capture repeated characters in text; at the same time, at most 3 characters before and behind the repeated patterns should be captured too. For example,

original	prefix	repeat	postfix
1aab	1	aa	b
1aaab	1	aaa	b
1234aaabcde	234	aaa	bcd

I coined a RE string in python:

reobj = re.compile("(?P<prefix>.{0,3})    (?P<repeat>(?P<infix>[a-z])(?P=infix){1,})    (?P<postfix>.{0,3})", re.IGNORECASE | re.VERBOSE | re.DOTALL)

but it gives such a result:

original	prefix	repeat	postfix	is desired?
1aab	1	aa	b	yes
1aaab	1a	aa	b	no
1234aaabcde	234	aaa	bcd	yes

any help? Thanks.

The fourth bird · Accepted Answer

You can use 4 capture groups, where group infix is only for capturing a single char to be repeated.

(?P<prefix>.{0,3}?)(?P<repeat>(?P<infix>[a-z])(?P=infix)+)(?P<postfix>.{0,3})

Regex demo

greedy backreference in python's reguar expression?

Tags:

python

regex

backreference

oyster

1 Answers

The fourth bird

Recent Activity

Donate For Us

greedy backreference in python's reguar expression?

Tags:

python

regex

backreference

oyster

1 Answers

The fourth bird

Related questions

Recent Activity

Donate For Us