I am having problems with C++0x regex when the string Im matching is a multiline string. Here is the code snippet Im trying to use:
std::smatch regMatch;
std::string data = "<key>id</key><string>1</string>\n<key>user</key><string>admin</string>";
if (std::regex_match(data, regMatch, std::regex("<key>user</key><string>(.*?)</string>"))) {
std::cout << "Reg match: " << regMatch[1].str() << std::endl;
}
The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
Line breaks If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”.
The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .
You should use regex_search
instead of regex_match
.
By the way, why not use (.*)
instead of (.*?)
?
The dot .
does not match newline characters by default. You can add the switch (?s)
to the beginning of the regex to switch on newline matching for the dot:
(?s)<key>user</key><string>(.*?)</string>
However, I'm not a huge fan of this because not all languages support this in their regex engines. Additionally, there might be another part of your regex pattern involving a dot that you don't want to match newlines. My preferred method is to just use a character set that includes a character class such as \s or \w along with its negated class. It's a pretty straightforward way of telling the regex to match this will match absolutely everything:
<key>user</key><string>([\w\W]*?)</string>
Maybe I'm misinterpreting how your XML is going to be parsed, but I've got to say that it's a bit odd how you intend to capture a string with the key name "user" that may or may not contain newlines (and other whitespace characters, and all other characters). Are you really okay with a user named
admin$#*
&% '";
_____
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With