Regular expression for finding a sub-string

Question

I am trying to find all occurances of a sub-string using regular expression. The sub-string is composed of three parts, starts with one or more 'A', followed by one or more 'N' and ended with one or more 'A'. Let a string 'AAANAANABNA' and if I parse the string I should get two sub-strings 'AAANAA' and 'AANA' as the output. So, I have tried the below code.

import regex as re
reg_a='A+N+A+'
s='AAANAANABNA'
sub_str=re.findall(reg_a,s,overlapped=True)
print(sub_str)

And, I am getting the below output,

['AAANAA', 'AANAA', 'ANAA', 'AANA', 'ANA']

But, I want the output as,

['AAANAA', 'AANA']

That is, the trailing A's of the first match should be the leading A's of the next match. How can I get that, any idea?

Wiktor Stribiżew · Accepted Answer

Make sure there are no A on the left:

>>> reg_a='(?<!A)A+N+A+'
>>> print( re.findall(reg_a,s,overlapped=True) )
['AAANAA', 'AANA']

The (?<!A)A+N+A+ matches

(?<!A) - a negative lookbehind that matches a location that is not immediately preceded with A
A+ - one or more As
N+ - one or more Ns
A+ - one or more As

Note you may use re to get the matches, too:

>>> import re
>>> re_a = r'(?=(?<!A)(A+N+A+))'
>>> print( re.findall(re_a, s) )
['AAANAA', 'AANA']

anubhava · Answer

Here is a simpler way of achieving this with re module. We just need a lookahead for 1+ trailing As and have to use use 2 capture groups:

>>> import re
>>> s = 'AAANAANABNA'
>>> [''.join(x) for x in re.findall(r'(A+N+)(?=(A+))', s)]
['AAANAA', 'AANA']

RegEx Demo

Regular expression for finding a sub-string

Tags:

python

string-matching

regex

python-3.x

python-regex

Saikat

2 Answers

Wiktor Stribiżew

anubhava

Recent Activity

Donate For Us

Regular expression for finding a sub-string

Tags:

python

string-matching

regex

python-3.x

python-regex

Saikat

2 Answers

Wiktor Stribiżew

anubhava

Related questions

Recent Activity

Donate For Us