Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex: Multiple matches in one line (using findall())

Tags:

python

regex

I'm looking for these "tags" inside text: {t d="var1"}var2{/t} or {t d="varA"}varB{/t} There can be more attributes, only "d" is mandatory: {t d="var1" foo="bar"}var2{/t}

My problem is - if there are more tags on one line, just one result is returned, not all of them. What is returned (from test string below): (u'single1', u'Required item3')

What I expect to be returned: (u'single1', u'required1') (u'single2', u'Required item2') (u'single3', u'Required item3') I got stuck with this. It works with one tag per line but doesn't with more tags per one line.

# -*- coding: UTF-8 -*-
import re

test_string = u'''
<span><img src="img/ico/required.png" class="icon" alt="{t d="single1"}required1{/t}" title="{t d="single2"}Required item2{/t}" /> {t d="single3"}Required item3{/t}</span>
'''


re_pattern = '''
    \{t[ ]{1}       # start tag name
    d="         # "d" attribute
    ([a-zA-Z0-9]*)      # "d" attribute content
    ".*\}       # end of "d" attribute
    (.+)        # tag content
    \{/t\}      # end tag
'''
rec_pattern = re.compile(re_pattern, re.VERBOSE)

res = rec_pattern.findall(test_string)
if res is not None:
    for item in res:
        print item
like image 556
dwich Avatar asked Dec 11 '22 19:12

dwich


1 Answers

Your wildcards are greedy. Change them from .* to .*? so they'll be non-greedy:

re_pattern = '''
    \{t[ ]{1}           # start tag name
    d="                 # "d" attribute
    ([a-zA-Z0-9]*)      # "d" attribute content
    ".*?\}              # end of "d" attribute
    (.+?)               # tag content
    \{/t\}              # end tag
'''
like image 51
Ned Batchelder Avatar answered Jan 04 '23 23:01

Ned Batchelder