Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert escaped Unicode character back to actual character

I have the following value in a string variable in Java which has UTF-8 characters encoded like below

Dodd\u2013Frank 

instead of

Dodd–Frank 

(Assume that I don't have control over how this value is assigned to this string variable)

Now how do I convert (encode) it properly and store it back in a String variable?

I found the following code

Charset.forName("UTF-8").encode(str); 

But this returns a ByteBuffer, but I want a String back.

Edit:

Some more additional information.

When I use System.out.println(str); I get

Dodd\u2013Frank 

I am not sure what is the correct terminology (UTF-8 or unicode). Pardon me for that.

like image 759
Sudar Avatar asked Dec 04 '12 10:12

Sudar


People also ask

What is escaped unicode?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

How do you escape unicode characters in Java?

According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits. So for example \u000A will be treated as a line feed.

What is encoding unicode escape Python?

In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

Is unicode a character collection?

Unicode. Unicode is a universal character set, ie. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers. It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded.


2 Answers

try

str = org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str); 

from Apache Commons Lang

like image 193
jlordo Avatar answered Sep 26 '22 12:09

jlordo


java.util.Properties

You can take advantage of the fact that java.util.Properties supports strings with '\uXXXX' escape sequences and do something like this:

Properties p = new Properties(); p.load(new StringReader("key="+yourInputString)); System.out.println("Escaped value: " + p.getProperty("key")); 

Inelegant, but functional.

To handle the possible IOExeception, you may want a try-catch.

Properties p = new Properties(); try { p.load( new StringReader( "key=" + input ) ) ; } catch ( IOException e ) { e.printStackTrace(); } System.out.println( "Escaped value: " + p.getProperty( "key" ) ); 
like image 31
drobert Avatar answered Sep 25 '22 12:09

drobert