I want to extract Information from a QString (.html) by using Regular Expressions. I explicitly want to use Regex (no Parser Solutions) and the class QRegularExpression (for several reasons e.g.: Reasons).
For simplification aspects here is an problem equivalent task.
Constructed source string:
<foo><bar s>INFO1.1</bar> </ qux> <peter></peter><bar e>INFO1.2
</bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFO2.1</bar>
</ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
<bar e>INFO2.2</bar></rofl>
*Note:*There could be more or less INFOs and additional sensless tags. (6 Infos e.g.)
Wanted:
Info1.1 and Info1.2 and Info2.1 and Info2.2 (e.g. in List)
Attempt
1.
QRegularExpression reA(".*<bar [es]>(.*)</bar>.*", QRegularExpression::DotMatchesEverythingOption);
->
INFOa</bar> </ qux> <peter></peter><bar e>INFOb
</bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFOc</bar>
</ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
<bar e>INFOd
2.
QRegularExpression reA("(.*<bar [es]>(.*)</bar>.*)*", QRegularExpression::DotMatchesEverythingOption);
->senseless
Problem:
The Regex is always related to the whole String. <bar s>INFO</bar><bar s>INFO</bar>
would select the first <bar s>
and the last and </bar>
. Wanted is first
With QRegExp there seems to be a solution, but i want to do this with QRegularExpression.
Maybe you can try with this
QRegularExpression reA("(<bar [se]>[^<]+</bar>)");
QRegularExpressionMatchIterator i = reA.globalMatch(input);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
if (match.hasMatch()) {
qDebug() << match.captured(0);
}
}
that gives me this output
"<bar s>INFO1.1</bar>"
"<bar e>INFO1.2
</bar>"
"<bar s>INFO2.1</bar>"
"<bar e>INFO2.2</bar>"
while this expression
QRegularExpression reA("((?<=<bar [se]>)((?!</bar>).)+(?=</bar>))",
QRegularExpression::DotMatchesEverythingOption);
with this input
<foo><bar s>INFO1</lol>.1</bar> </ qux> <peter></peter><bar e>INFO1.2
</bar><fred></ senseless></fred></ xx><lol></lol></foo><bar s>INFO2.1</bar>
</ nothing><endlessSenselessTags></endlessSenselessTags><rofl>
<bar e>INFO2.2</bar></rofl>
gives me as output
"INFO1</lol>.1"
"INFO1.2
"
"INFO2.1"
"INFO2.2"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With