I have an ASCII String, with HTML entities, like:
à
¨
ç
I need this String to be without those entities and convert them into UTF-8 chars. Is there any easy way, in java to do that?
Where:
Clazz.method("aà","UTF-8")
returns "aà"
or something like that?
Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.
UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condensed as ASCII but can also contain any Unicode characters with some increase in the size of the file. UTF stands for Unicode Transformation Format. The '8' signifies that it allocates 8-bit blocks to denote a character.
encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.
Take a look at org.apache.commons.lang.StringEscapeUtils.unescapeHtml(...). Apparently it understands all character entities defined in HTML 4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With