Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

escaped html won't unescaped (now: unescaped html won't escape back)

So I'm currently using the commons lang apache library.

When I tried unescaping this string: 😀 This returns the same string: 😀

String characters = "😀"
StringEscapeUtils.unescapeHtml(characters);

Output: 😀

But when I tried unescaping a String with a less few characters, it works:

String characters = "㈳"
StringEscapeUtils.unescapeHtml(characters);

Output: ㈳

Any ideas? When I tried unescaping this String "😀" on online unescaping utility, it works, so maybe it's a bug in the apache common langs library? Or can anyone recommend another library?

Thanks.

UPDATES:

I'm now able to unescape the String successfully. The problem now is when I tried to escaped the result of that unescape, it won't bring back the String (😀).

like image 589
lorraine Avatar asked Feb 05 '13 10:02

lorraine


1 Answers

unescapeHtml() leaves 😀 untouched because – as the documentation says – it only unescapes HTML 4.0 entities, which are limited to 65,536 characters. Unfortunately, 128,512 is far beyond that limit.

Have you tried using unescapeXml()?

XML supports up to 1,114,111 (10FFFFh) character entities (link).

like image 169
wassup Avatar answered Oct 21 '22 06:10

wassup