Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Regex not matching multiline strings

Tags:

c++

regex

I am having problems with C++0x regex when the string Im matching is a multiline string. Here is the code snippet Im trying to use:

std::smatch regMatch;
std::string data = "<key>id</key><string>1</string>\n<key>user</key><string>admin</string>";
if (std::regex_match(data, regMatch, std::regex("<key>user</key><string>(.*?)</string>"))) {
    std::cout << "Reg match: " << regMatch[1].str() << std::endl;
}
like image 264
Nikola C Avatar asked Mar 12 '14 23:03

Nikola C


People also ask

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

How do I enable line breaks in regex?

Line breaks If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”.

How do you match in regex?

The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .


2 Answers

You should use regex_search instead of regex_match.

By the way, why not use (.*) instead of (.*?)?

like image 83
Tim Shen Avatar answered Oct 02 '22 05:10

Tim Shen


The dot . does not match newline characters by default. You can add the switch (?s) to the beginning of the regex to switch on newline matching for the dot:

(?s)<key>user</key><string>(.*?)</string>

However, I'm not a huge fan of this because not all languages support this in their regex engines. Additionally, there might be another part of your regex pattern involving a dot that you don't want to match newlines. My preferred method is to just use a character set that includes a character class such as \s or \w along with its negated class. It's a pretty straightforward way of telling the regex to match this will match absolutely everything:

<key>user</key><string>([\w\W]*?)</string>

Maybe I'm misinterpreting how your XML is going to be parsed, but I've got to say that it's a bit odd how you intend to capture a string with the key name "user" that may or may not contain newlines (and other whitespace characters, and all other characters). Are you really okay with a user named

admin$#* &% '"; _____?

like image 43
CAustin Avatar answered Oct 02 '22 05:10

CAustin