Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ TR1 regex - multiline option

Tags:

c++

regex

tr1

I thought that $ indicates the end of string. However, the following piece of code gives "testbbbccc" as a result, which is quite astonishing to me... This means that $ actually matches end of line, not end of the whole string.

#include <iostream>
#include <regex>

using namespace std;

int main()
{
    tr1::regex r("aaa([^]*?)(ogr|$)");
    string test("bbbaaatestbbbccc\nddd");
    vector<int> captures;
    captures.push_back(1);
    const std::tr1::sregex_token_iterator end;
    for (std::tr1::sregex_token_iterator iter(test.begin(), test.end(), r, captures); iter != end; )
    {
        string& t1 = iter->str();
        iter++;
        cout &lt;&lt; t1;
    }
} 

I have been trying to find a "multiline" switch (which actually can be easily found in PCRE), but without success... Can someone point me to the right direction?

Regards, R.P.

like image 792
R.P. Avatar asked Dec 10 '10 12:12

R.P.


2 Answers

As Boost::Regex was selected for tr1, try the following:

From Boost::Regex

Anchors:

A '^' character shall match the start of a line when used as the first character of an expression, or the first character of a sub-expression.

A '$' character shall match the end of a line when used as the last character of an expression, or the last character of a sub-expression.

So the behavior you observed is correct.

From: Boost Regex as well:

\A Matches at the start of a buffer only (the same as \`).
\z Matches at the end of a buffer only (the same as \').
\Z Matches an optional sequence of newlines at the end of a buffer: equivalent to the regular expression \n*\z

I hope that helps.

like image 85
Tobias Langner Avatar answered Oct 10 '22 20:10

Tobias Langner


There is no multiline switch in TR1 regexs. It's not exactly the same, but you could get the same functionality matching everything:

(.|\r|\n)*?

This matches non-greedily every character, including new line and carriage return.

Note: Remember to escape the backslashes '\' like this '\\' if your pattern is a C++ string in code.

Note 2: If you don't want to capture the matched contents, append '?:' to the opening bracket:

(?:.|\r|\n)*?
like image 30
Juan Calero Avatar answered Oct 10 '22 21:10

Juan Calero