Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I escape all escape-worthy characters in one line of code?

Based on what I see here (accepted answer), it would seem that I could escape strings by doing this:

string s = "Woolworth's";
string t = Regex.Escape(s);
MessageBox.Show(t);

...but stepping through that, I see no difference between s and t (I hoped I'd see "Woolworth\'s" as the value of t instead of "Woolworth's" for both vars).

I could, I guess, do something like this:

    string s = "Woolworth's";
    s = s.Replace("'", "\'");

...etc., also escaping the following: [, ^, $, ., |, ?, *, +, (, ), and \

...but a "one stop shopping" solution would be preferable.

To be more specific, I need a string entered by a user to be something that is acceptable as a string value in an Android arrays.xml file.

For example, it chokes on this:

<item>Woolworth's</item>

...which needs to be this:

<item>Woolworth\'s</item>
like image 595
B. Clay Shannon-B. Crow Raven Avatar asked Jun 04 '14 21:06

B. Clay Shannon-B. Crow Raven


2 Answers

Regex.Escape() only escapes regex reserved characters:

Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.


Match/Capture a character class of characters you want to escape (note, some characters have special meanings in character classes and need to be escaped like \ and -):

(['^$.|?*+()\\])

And then replace it with a backslash and a reference to the character you want to escape:

\\1

Demo


In C#:

string s = "Woolworth's";
Regex rgx = new Regex("(['^$.|?*+()\\\\])");

string t = rgx.Replace(s, "\\$1");
// Woolworth\'s

Demo

like image 149
Sam Avatar answered Oct 05 '22 23:10

Sam


Regex.Escape is not suitable for this context.

It is designed strictly for regular expressions and will escape both too much and too little for this context - trying to shoe-horn it into the model will likely break other values. (It doesn't escape ' or " because those characters have no special meaning in a .NET regular expression.)

The thing of relevance here is Item Element in a String Resource File does some special parsing of the text (related to the formatting) after it is read from XML:

If you have an apostrophe or a quote in your string, you must either escape it or enclose the whole string in the other type of enclosing quotes.

As such, a transformation appropriate in this context is simply

s.Replace("'", "\'").Replace("\"", "\\\"")

or

Regex.Replace(s, "['\"]", "\\$&")

(And then, assuming the XML is being properly built via a DOM or LINQ to XML, the XML encoding is taken care of elsewhere - although the rules are more complicated when using formatting vs mixed content styling.)

like image 34
user2864740 Avatar answered Oct 06 '22 01:10

user2864740