Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regular Expression with optional but greedy groups

Tags:

python

regex

I'm trying to write a regular expression to match a string that may or may not contain two tags. I need the expression to return me all five elements of the string, depending on whether they exist, but when I make the tags optional, the wildcard bits seem to gobble them up:

Inputs could be:

text{a}more{b}words  
{a}text{b}test  
text  
text{b}text  
text{b}  
text{a}text 

Et cetera. The only thing guaranteed is that <a> will always be before <b>, provided they exist.

My expression now looks as follows:

^(.*?)(\{a\})?(.*?)(\{b\})?(.*?)$

Unfortunately, this ends up throwing all text into the last group, regardless of whether or not the tags are present. Is there some way to make them greedy, yet keep them optional? re.findall doesn't seem to help either unfortunately.

Any help would be greatly appreciated! :)

like image 770
kmh Avatar asked Feb 23 '11 18:02

kmh


1 Answers

Try the following regex: ^(.*(?={a})|.*?)({a})?(.*(?={b})|.*)({b})?(.*?)$

import re

inputs = ['{a}text{b}test', 'text', 'text{b}text', 'text{b}', 'text{a}text']
p = re.compile(r"^(.*(?={a})|.*?)({a})?(.*(?={b})|.*)({b})?(.*?)$")
for input in inputs:
    print p.match(input).groups()

Output:

('', '{a}', 'text', '{b}', 'test')
('', None, 'text', None, '')
('', None, 'text', '{b}', 'text')
('', None, 'text', '{b}', '')
('text', '{a}', 'text', None, '')
like image 154
Andrew Clark Avatar answered Sep 27 '22 23:09

Andrew Clark