Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to escape a string for use in Boost Regex

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

Alternatively, is there a list of all characters that would need to be escaped?

like image 764
Gerald Avatar asked Aug 10 '09 03:08

Gerald


People also ask

How do you escape expressions in regex?

The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.

How do you escape a string?

String newstr = "\\"; \ is a special character within a string used for escaping. "\" does now work because it is escaping the second " . To get a literal \ you need to escape it using \ .

What is boost regex?

Boost. Regex allows you to use regular expressions in C++. As the library is part of the standard library since C++11, you don't depend on Boost. Regex if your development environment supports C++11.

What is the use of escape string?

The escape() function computes a new string in which certain characters have been replaced by a hexadecimal escape sequence. Note: This function was used mostly for URL queries (the part of a URL following ? ) —not for escaping ordinary String literals, which use the format \xHH .


2 Answers

. ^ $ | ( ) [ ] { } * + ? \ 

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]"); const std::string rep("\\\\&"); std::string result = regex_replace(url_to_escape, esc, rep,                                    boost::match_default | boost::format_sed); 

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

const std::string rep("\\\\$&"); std::string result = regex_replace(url_to_escape, esc, rep,                                    boost::match_default | boost::format_perl); 
like image 53
Amber Avatar answered Sep 24 '22 01:09

Amber


Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) {     static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") );     const std::wstring rep( _T("\\\\&") );     std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);     return result; } 

For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.

like image 42
Nishi Avatar answered Sep 20 '22 01:09

Nishi