I'm just getting my head around regular expressions, and I'm using the Boost Regex library.
I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.
Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.
Alternatively, is there a list of all characters that would need to be escaped?
The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.
String newstr = "\\"; \ is a special character within a string used for escaping. "\" does now work because it is escaping the second " . To get a literal \ you need to escape it using \ .
Boost. Regex allows you to use regular expressions in C++. As the library is part of the standard library since C++11, you don't depend on Boost. Regex if your development environment supports C++11.
The escape() function computes a new string in which certain characters have been replaced by a hexadecimal escape sequence. Note: This function was used mostly for URL queries (the part of a URL following ? ) —not for escaping ordinary String literals, which use the format \xHH .
. ^ $ | ( ) [ ] { } * + ? \
Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.
const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]"); const std::string rep("\\\\&"); std::string result = regex_replace(url_to_escape, esc, rep, boost::match_default | boost::format_sed);
(The flag boost::format_sed
specifies to use the replacement string format of sed. In sed, an escape &
will output whatever matched by the whole expression)
Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl
, and you can use the familiar $&
to refer to whatever matched by the whole expression.
const std::string rep("\\\\$&"); std::string result = regex_replace(url_to_escape, esc, rep, boost::match_default | boost::format_perl);
Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape()
:
std::wstring regex_escape(const std::wstring& string_to_escape) { static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") ); const std::wstring rep( _T("\\\\&") ); std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed); return result; }
For ASCII version, use std::string
/boost::regex
instead of std::wstring
/boost::wregex
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With