I am trying to find all occurances of a sub-string using regular expression. The sub-string is composed of three parts, starts with one or more 'A', followed by one or more 'N' and ended with one or more 'A'. Let a string 'AAANAANABNA' and if I parse the string I should get two sub-strings 'AAANAA' and 'AANA' as the output. So, I have tried the below code.
import regex as re
reg_a='A+N+A+'
s='AAANAANABNA'
sub_str=re.findall(reg_a,s,overlapped=True)
print(sub_str)
And, I am getting the below output,
['AAANAA', 'AANAA', 'ANAA', 'AANA', 'ANA']
But, I want the output as,
['AAANAA', 'AANA']
That is, the trailing A's of the first match should be the leading A's of the next match. How can I get that, any idea?
Make sure there are no A
on the left:
>>> reg_a='(?<!A)A+N+A+'
>>> print( re.findall(reg_a,s,overlapped=True) )
['AAANAA', 'AANA']
The (?<!A)A+N+A+
matches
(?<!A)
- a negative lookbehind that matches a location that is not immediately preceded with A
A+
- one or more A
sN+
- one or more N
sA+
- one or more A
sNote you may use re
to get the matches, too:
>>> import re
>>> re_a = r'(?=(?<!A)(A+N+A+))'
>>> print( re.findall(re_a, s) )
['AAANAA', 'AANA']
Here is a simpler way of achieving this with re
module. We just need a lookahead for 1+ trailing A
s and have to use use 2 capture groups:
>>> import re
>>> s = 'AAANAANABNA'
>>> [''.join(x) for x in re.findall(r'(A+N+)(?=(A+))', s)]
['AAANAA', 'AANA']
RegEx Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With