I'm trying to write a regular expression to match a string that may or may not contain two tags. I need the expression to return me all five elements of the string, depending on whether they exist, but when I make the tags optional, the wildcard bits seem to gobble them up:
Inputs could be:
text{a}more{b}words
{a}text{b}test
text
text{b}text
text{b}
text{a}text
Et cetera. The only thing guaranteed is that <a>
will always be before <b>
, provided they exist.
My expression now looks as follows:
^(.*?)(\{a\})?(.*?)(\{b\})?(.*?)$
Unfortunately, this ends up throwing all text into the last group, regardless of whether or not the tags are present. Is there some way to make them greedy, yet keep them optional? re.findall
doesn't seem to help either unfortunately.
Any help would be greatly appreciated! :)
Try the following regex: ^(.*(?={a})|.*?)({a})?(.*(?={b})|.*)({b})?(.*?)$
import re
inputs = ['{a}text{b}test', 'text', 'text{b}text', 'text{b}', 'text{a}text']
p = re.compile(r"^(.*(?={a})|.*?)({a})?(.*(?={b})|.*)({b})?(.*?)$")
for input in inputs:
print p.match(input).groups()
Output:
('', '{a}', 'text', '{b}', 'test')
('', None, 'text', None, '')
('', None, 'text', '{b}', 'text')
('', None, 'text', '{b}', '')
('text', '{a}', 'text', None, '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With