Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::regex escape backslashes in file path

I'm string to create a std::regex(__FILE__) as part of a unit test which checks some exception output that prints the file name.

On Windows it fails with:

regex_error(error_escape): The expression contained an invalid escaped character, or a trailing escape.

because the __FILE__ macro expansion contains un-escaped backslashes.

Is there a more elegant way to escape the backslashes than to loop through the resulting string (i.e. with a std algorithm or some std::string function)?

like image 483
Nicolas Holthaus Avatar asked Aug 30 '16 13:08

Nicolas Holthaus


People also ask

How do you escape a backslash in C++?

C++ assigns special meaning to the backslash within a string literal and requires it to be escaped to be read as an actual backslash: To represent a single backslash, it's necessary to place double backslashes (\\) in the source code. (Exception: Raw literals, supported by C++11, remove the need to escape characters.)

What do Backslashes mean in regex?

\ The backslash suppresses the special meaning of the character it precedes, and turns it into an ordinary character. To insert a backslash into your regular expression pattern, use a double backslash ('\\').


1 Answers

File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.

Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).

Fortunately, we can solve our regular expression problem with more regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
  static const std::regex metacharacters(R"([\.\^\$\-\+\(\)\[\]\{\}\|\?\*)");
  return std::regex_replace(s, metacharacters, "\\$&");
}

(File paths can't contain * or ?, but I've included them to keep the function general.)

If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
  static const char metacharacters[] = R"(\.^$-+()[]{}|?*)";
  std::string out;
  out.reserve(s.size());
  for (auto ch : s) {
    if (std::strchr(metacharacters, ch))
      out.push_back('\\');
    out.push_back(ch);
  }
  return out;
}

Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.

like image 77
Adrian McCarthy Avatar answered Oct 03 '22 19:10

Adrian McCarthy