Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

QT C++ QRegularExpression multiple matches

Tags:

c++

regex

qt

I want to extract Information from a QString (.html) by using Regular Expressions. I explicitly want to use Regex (no Parser Solutions) and the class QRegularExpression (for several reasons e.g.: Reasons).

For simplification aspects here is an problem equivalent task.

Constructed source string:

<foo><bar s>INFO1.1</bar> </ qux> <peter></peter><bar e>INFO1.2
</bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFO2.1</bar>
</ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
<bar e>INFO2.2</bar></rofl>

*Note:*There could be more or less INFOs and additional sensless tags. (6 Infos e.g.)

Wanted:

Info1.1 and Info1.2 and Info2.1 and Info2.2 (e.g. in List)

Attempt

1.

QRegularExpression reA(".*<bar [es]>(.*)</bar>.*", QRegularExpression::DotMatchesEverythingOption);

->

INFOa</bar> </ qux> <peter></peter><bar e>INFOb
    </bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFOc</bar>
    </ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
    <bar e>INFOd

2.

QRegularExpression reA("(.*<bar [es]>(.*)</bar>.*)*", QRegularExpression::DotMatchesEverythingOption);

->senseless

Problem: The Regex is always related to the whole String. <bar s>INFO</bar><bar s>INFO</bar> would select the first <bar s> and the last and </bar>. Wanted is first

With QRegExp there seems to be a solution, but i want to do this with QRegularExpression.

like image 620
SearchSpace Avatar asked Mar 18 '14 19:03

SearchSpace


1 Answers

Maybe you can try with this

QRegularExpression reA("(<bar [se]>[^<]+</bar>)");

QRegularExpressionMatchIterator i = reA.globalMatch(input);
while (i.hasNext()) {
    QRegularExpressionMatch match = i.next();
    if (match.hasMatch()) {
         qDebug() << match.captured(0);
    }
}

that gives me this output

"<bar s>INFO1.1</bar>" 
"<bar e>INFO1.2
</bar>" 
"<bar s>INFO2.1</bar>" 
"<bar e>INFO2.2</bar>"  

while this expression

QRegularExpression reA("((?<=<bar [se]>)((?!</bar>).)+(?=</bar>))",
                       QRegularExpression::DotMatchesEverythingOption);

with this input

<foo><bar s>INFO1</lol>.1</bar> </ qux> <peter></peter><bar e>INFO1.2
</bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFO2.1</bar>
</ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
<bar e>INFO2.2</bar></rofl>

gives me as output

"INFO1</lol>.1" 
"INFO1.2
" 
"INFO2.1" 
"INFO2.2"
like image 116
Salvatore Avanzo Avatar answered Sep 17 '22 22:09

Salvatore Avanzo