I need to escape special characters in a String
.
Guava provides the Escaper
class, which does exactly this:
Escaper escaper = Escapers.builder()
.addEscape('[', "\\[")
.addEscape(']', "\\]")
.build();
String escapedStr = escaper.escape("This is a [test]");
System.out.println(escapedStr);
// -> prints "This is a \[test\]"
Now that I have an escaped String
, I need to unescape it and I can't find anything in Guava to do this.
I was expecting Escaper
to have a unescape()
method, but it isn't the case.
Edit : I'm aware that unescaping can be tricky, even impossible in some non-sense cases.
For example, this Escaper
usage can lead to ambiguities :
Escaper escaper = Escapers.builder()
.addEscape('@', " at ")
.addEscape('.', " dot ")
.build();
Unless the escaped data contains only email addresses and nothing more, you can't safely get your data back by unescaping it.
A good example of a safe usage of the Escaper
is HTML entities :
Escaper escaper = Escapers.builder()
.addEscape('&', "&")
.addEscape('<', "<")
.addEscape('>', ">")
.build();
Here, you can safely escape any text, incorporate it in a HTML page and unescape it at any time to display it, because you covered every possible ambiguities.
In conclusion, I don't see why unescaping is so controversial. I think it is the developper's responsability to use this class properly, knowing his data and avoiding ambiguities. Escaping, by definition, means you will eventually need to unescape. Otherwise, it's obfuscation or some other concept.
No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:
The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:
String s = "foo\n\"bar\"\n\\";
Then my parser has to already understand
\n
,\"
, and\\
in order to identify that...foo\n\"bar\"\n\\
...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.
So it looks like you'll have to do it yourself.
If you just need to unescape HTML entities, Unicode characters and control characters like \n
or \t
you can simply use the StringEscapeUtils class from Apache Commons Lang.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With