Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode to string conversion in Java

I am building a language, a toy language. The syntax \#0061 is supposed to convert the given Unicode to an character:

String temp = yytext().subtring(2);

Then after that try to append '\u' to the string, I noticed that generated an error.

I also tried to "\\" + "u" + temp; this way does not do any conversion.

I am basically trying to convert Unicode to a character by supplying only '0061' to a method, help.

like image 958
ferronrsmith Avatar asked Dec 20 '09 04:12

ferronrsmith


People also ask

What is Unicode in Java string?

Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF. UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condensed as ASCII but can also contain any Unicode characters with some increase in the size of the file.

How do you escape Unicode characters in Java?

According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits.

What does uFFFF mean in Java?

\uFFFF is a format of how Unicode is presented in where I read it from (say ASCII file), not a literal.

Is Java string Unicode or ASCII?

Internally, Java uses the Unicode character set. Unicode is a two-byte extension of the one-byte ISO Latin-1 character set, which in turn is an eight-bit superset of the seven-bit ASCII character set.


2 Answers

Strip the '#' and use Integer.parseInt("0061", 16) to convert the hex digits to an int. Then cast to a char.

(If you had implemented the lexer by hand, an alternatively would be to do the conversion on the fly as your lexer matches the unicode literal. But on rereading the question, I see that you are using a lexer generator ... good move!)

like image 160
Stephen C Avatar answered Nov 03 '22 01:11

Stephen C


i am basically trying to convert unicode to a character by supplying only '0061' to a method, help.

char fromUnicode(String codePoint) {
  return (char)  Integer.parseInt(codePoint, 16);
}

You need to handle bad inputs and such, but that will work otherwise.

like image 27
danben Avatar answered Nov 03 '22 01:11

danben