Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 regex end-of-line doesn't match

Tags:

c++

regex

c++11

I can't get the $ (dollar-sign) to work as documented in C++11 regular expressions. This is with ECMAScript syntax (the default).

Example (regex.cc):

#include <iostream>
#include <regex>

int main() {
    if ( std::regex_search("one\ntwo", std::regex{"one$"}) ) {
        std::cout << "Should match, doesn't." << std::endl;
    }

    if ( std::regex_search("one\ntwo", std::regex{"two$"}
                         , std::regex_constants::match_not_eol) ) {
        std::cout << "Shouldn't match, does." << std::endl;
    }

    return 0;
}

Expected output: Should match, doesn't.

Actual output: Shouldn't match, does.

From http://www.cplusplus.com/reference/regex/ECMAScript/:

$ - End of line - Either it is the end of the target sequence, or precedes a line terminator.

From http://www.cplusplus.com/reference/regex/regex_search/:

match_not_eol - Not End-Of-Line - The last character is not considered an end of line ("$" does not match).

Tested with Clang 3.3 and 3.4 on FreeBSD 10:

clang++ -std=c++11 -stdlib=libc++ -o regex regex.cc && ./regex

What am I missing?

like image 213
msimonsson Avatar asked Feb 10 '14 22:02

msimonsson


People also ask

Which regex matches the end of line?

End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .


1 Answers

Looks like you stumbled on LWG issue 2343

To quote,

If Multiline is true, $ matches just before LineTerminator.

If Multiline is false, $ does not match just before LineTerminator.

[,,,]

Multiline of the existing implementations are as follows:

Multiline=false:

libstdc++ r206594

libc++ r199174

Multiline=true:

Visual Studio Express 2013

boost 1.55

Note: using the current SVN version of libc++, your first test IS actually matched, so looks like this LWG issue is going to be resolved in Multiline's favor

The second issue (match_not_eol ignored) looks like a fairly straightforward implementation bug. Boost.regex doesn't match that test case.

like image 59
Cubbi Avatar answered Sep 26 '22 03:09

Cubbi