Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++11/regex - search for exact string, escape [duplicate]

Say you have a string which is provided by the user. It can contain any kind of character. Examples are:

std::string s1{"hello world");
std::string s1{".*");
std::string s1{"*{}97(}{.}}\\testing___just a --%#$%# literal%$#%^"};
...

Now I want to search in some text for occurrences of >> followed by the input string s1 followed by <<. For this, I have the following code:

std::string input; // the input text
std::regex regex{">> " + s1 + " <<"};

if (std::regex_match(input, regex)) {
     // add logic here
}

This works fine if s1 did not contain any special characters. However, if s1 had some special characters, which are recognized by the regex engine, it doesn't work.

How can I escape s1 such that std::regex considers it as a literal, and therefore does not interpret s1? In other words, the regex should be:

std::regex regex{">> " + ESCAPE(s1) + " <<"};

Is there a function like ESCAPE() in std?

important I simplified my question. In my real case, the regex is much more complex. As I am only having troubles with the fact the s1 is interpreted, I left these details out.

like image 942
Karel Demeester Avatar asked Oct 22 '16 18:10

Karel Demeester


People also ask

What does \s mean in regex?

The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]

How do you escape a string in regex?

The backslash in a regular expression precedes a literal character. You also escape certain letters that represent common character classes, such as \w for a word character or \s for a space.

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.

How do you repeat a pattern in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.


1 Answers

You will have to escape all special characters in the string with \. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex.

// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };

std::string input = ">> "+ s1 +" <<"; 
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );

// "sanitized" can now safely be used in another expression
like image 86
Austin Brunkhorst Avatar answered Oct 07 '22 11:10

Austin Brunkhorst