Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for finding a sub-string

I am trying to find all occurances of a sub-string using regular expression. The sub-string is composed of three parts, starts with one or more 'A', followed by one or more 'N' and ended with one or more 'A'. Let a string 'AAANAANABNA' and if I parse the string I should get two sub-strings 'AAANAA' and 'AANA' as the output. So, I have tried the below code.

import regex as re
reg_a='A+N+A+'
s='AAANAANABNA'
sub_str=re.findall(reg_a,s,overlapped=True)
print(sub_str)

And, I am getting the below output,

['AAANAA', 'AANAA', 'ANAA', 'AANA', 'ANA']

But, I want the output as,

['AAANAA', 'AANA']

That is, the trailing A's of the first match should be the leading A's of the next match. How can I get that, any idea?

like image 867
Saikat Avatar asked Dec 14 '22 08:12

Saikat


2 Answers

Make sure there are no A on the left:

>>> reg_a='(?<!A)A+N+A+'
>>> print( re.findall(reg_a,s,overlapped=True) )
['AAANAA', 'AANA']

The (?<!A)A+N+A+ matches

  • (?<!A) - a negative lookbehind that matches a location that is not immediately preceded with A
  • A+ - one or more As
  • N+ - one or more Ns
  • A+ - one or more As

Note you may use re to get the matches, too:

>>> import re
>>> re_a = r'(?=(?<!A)(A+N+A+))'
>>> print( re.findall(re_a, s) )
['AAANAA', 'AANA']
like image 181
Wiktor Stribiżew Avatar answered Dec 15 '22 21:12

Wiktor Stribiżew


Here is a simpler way of achieving this with re module. We just need a lookahead for 1+ trailing As and have to use use 2 capture groups:

>>> import re
>>> s = 'AAANAANABNA'
>>> [''.join(x) for x in re.findall(r'(A+N+)(?=(A+))', s)]
['AAANAA', 'AANA']

RegEx Demo

like image 45
anubhava Avatar answered Dec 15 '22 22:12

anubhava