Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to capture the entire string while using 'lookaround' with chars in regex?

I have to find all strings which are made of only letters 'a' and 'b' and every instance of 'a' is immediately followed by 'b' and immediately preceded by 'b'.

For example:

mystring = 'bab babab babbab ab baba aba xyz'

Then my regex should return:

['bab' 'babab' 'babbab']  

(In string 'ab' - 'a' is not preceded by 'b'. Similarly for 'aba' and 'xyz' is not made of only 'a','b'.)

I used lookahead for this and wrote this regex:

re.findall(r'((?<=b)a(?=b))',mystring)

But this only returns me all instances of 'a' which are followed/preceded by 'b' like:

['a','a','a','a']

But I need whole words. How can I find whole words using regex? I tried to modify my regex with various options, but nothing seems to work. How can this be done?

like image 538
Karthik Elango Avatar asked Oct 20 '22 01:10

Karthik Elango


1 Answers

You can use following regex :

>>> re.findall(r'\b(?:b+a)+b+\b',mystring)
['bab', 'babab', 'babbab']

Regular expression visualization

Debuggex Demo

As you can see from preceding diagram this regex will match any combination of ba (which b can presents more than one time), which produce words that every a precede by b then the whole of the string can be followed by one or more b.

like image 113
Mazdak Avatar answered Nov 01 '22 09:11

Mazdak