Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match line-break in c++ regex?

Tags:

c++

regex

I tried the following regex:

const static char * regex_string = "([a-zA-Z0-9]+).*";

void find_first(const std::string str);

int main(int argc, char ** argv)
{
        find_first("0s7fg9078dfg09d78fg097dsfg7sdg\r\nfdfgdfg");
}
void find_first(const std::string str)
{
        std::cout << str << std::endl;
        std::regex rgx(regex_string);
        std::smatch matcher;
        if(std::regex_match(str, matcher, rgx))
        {
                std::cout << "Found : " << matcher.str(0) << std::endl;
        } else {
                std::cout << "Not found" << std::endl;
        }
}

DEMO

I expected the regex will be completely correct and the group will be found. But it wasn't. Why? How can I match the line-break in c++ regex? In Java it works fine.

like image 464
St.Antario Avatar asked Nov 15 '15 09:11

St.Antario


2 Answers

The dot in regex usually matches any character other than a newline std::ECMAScript syntax.

.   not newline   any character except line terminators (LF, CR, LS, PS).

0s7fg9078dfg09d78fg097dsfg7sdg\r\nfdfgdfg
[a-zA-Z0-9]+ matches until \r ↑___↑ .* would match from here

In many regex flavors there is a dotall flag available to make the dot also match newlines.

If not, there are workarounds in different languages such as [^] not nothing or [\S\s] any whitespace or non-whitespace together in a class wich results in any character including \n

regex_string = "([a-zA-Z0-9]+)[\\S\\s]*";

Or use optional line breaks: ([a-zA-Z0-9]+).*(?:\\r?\\n.*)* or ([a-zA-Z0-9]+)(?:.|\\r?\\n)*

See your updated demo


Update - Another idea worth mentioning: std::regex::extended

A <period> ( '.' ), when used outside a bracket expression, is an ERE that shall match any character in the supported character set except NUL.

std::regex rgx(regex_string, std::regex::extended);

See this demo at tio.run

like image 125
bobble bubble Avatar answered Sep 20 '22 11:09

bobble bubble


You may try const static char * regex_string = "((.|\r\n)*)"; I hope It will help you.

like image 42
Rajib Chy Avatar answered Sep 19 '22 11:09

Rajib Chy