Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Guava provide a method to unescape a String?

Tags:

java

guava

I need to escape special characters in a String.

Guava provides the Escaper class, which does exactly this:

Escaper escaper = Escapers.builder()
        .addEscape('[', "\\[")
        .addEscape(']', "\\]")
        .build();

String escapedStr = escaper.escape("This is a [test]");

System.out.println(escapedStr);
// -> prints "This is a \[test\]"

Now that I have an escaped String, I need to unescape it and I can't find anything in Guava to do this.

I was expecting Escaper to have a unescape() method, but it isn't the case.

Edit : I'm aware that unescaping can be tricky, even impossible in some non-sense cases.

For example, this Escaper usage can lead to ambiguities :

Escaper escaper = Escapers.builder()
        .addEscape('@', " at ")
        .addEscape('.', " dot ")
        .build();

Unless the escaped data contains only email addresses and nothing more, you can't safely get your data back by unescaping it.

A good example of a safe usage of the Escaper is HTML entities :

Escaper escaper = Escapers.builder()
        .addEscape('&', "&")
        .addEscape('<', "&lt;")
        .addEscape('>', "&gt;")
        .build();

Here, you can safely escape any text, incorporate it in a HTML page and unescape it at any time to display it, because you covered every possible ambiguities.

In conclusion, I don't see why unescaping is so controversial. I think it is the developper's responsability to use this class properly, knowing his data and avoiding ambiguities. Escaping, by definition, means you will eventually need to unescape. Otherwise, it's obfuscation or some other concept.

like image 559
Eric Citaire Avatar asked Dec 04 '15 15:12

Eric Citaire


2 Answers

No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:

The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:

String s = "foo\n\"bar\"\n\\";

Then my parser has to already understand \n, \", and \\ in order to identify that...

foo\n\"bar\"\n\\

...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.

So it looks like you'll have to do it yourself.

like image 90
Tunaki Avatar answered Nov 06 '22 22:11

Tunaki


If you just need to unescape HTML entities, Unicode characters and control characters like \n or \t you can simply use the StringEscapeUtils class from Apache Commons Lang.

like image 22
Emmanuel Bourg Avatar answered Nov 06 '22 22:11

Emmanuel Bourg