Say you have a string which is provided by the user. It can contain any kind of character. Examples are:
std::string s1{"hello world");
std::string s1{".*");
std::string s1{"*{}97(}{.}}\\testing___just a --%#$%# literal%$#%^"};
...
Now I want to search in some text for occurrences of >>
followed by the input string s1
followed by <<
. For this, I have the following code:
std::string input; // the input text
std::regex regex{">> " + s1 + " <<"};
if (std::regex_match(input, regex)) {
// add logic here
}
This works fine if s1
did not contain any special characters. However, if s1
had some special characters, which are recognized by the regex engine, it doesn't work.
How can I escape s1
such that std::regex
considers it as a literal, and therefore does not interpret s1
? In other words, the regex should be:
std::regex regex{">> " + ESCAPE(s1) + " <<"};
Is there a function like ESCAPE()
in std
?
important I simplified my question. In my real case, the regex is much more complex. As I am only having troubles with the fact the s1
is interpreted, I left these details out.
The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]
The backslash in a regular expression precedes a literal character. You also escape certain letters that represent common character classes, such as \w for a word character or \s for a space.
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.
You will have to escape all special characters in the string with \
. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex
.
// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };
std::string input = ">> "+ s1 +" <<";
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );
// "sanitized" can now safely be used in another expression
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With