I am building a language, a toy language. The syntax \#0061
is supposed to convert the given Unicode to an character:
String temp = yytext().subtring(2);
Then after that try to append '\u'
to the string, I noticed that generated an error.
I also tried to "\\" + "u" + temp;
this way does not do any conversion.
I am basically trying to convert Unicode to a character by supplying only '0061'
to a method, help.
Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF. UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condensed as ASCII but can also contain any Unicode characters with some increase in the size of the file.
According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits.
\uFFFF is a format of how Unicode is presented in where I read it from (say ASCII file), not a literal.
Internally, Java uses the Unicode character set. Unicode is a two-byte extension of the one-byte ISO Latin-1 character set, which in turn is an eight-bit superset of the seven-bit ASCII character set.
Strip the '#' and use Integer.parseInt("0061", 16)
to convert the hex digits to an int
. Then cast to a char
.
(If you had implemented the lexer by hand, an alternatively would be to do the conversion on the fly as your lexer matches the unicode literal. But on rereading the question, I see that you are using a lexer generator ... good move!)
i am basically trying to convert unicode to a character by supplying only '0061' to a method, help.
char fromUnicode(String codePoint) {
return (char) Integer.parseInt(codePoint, 16);
}
You need to handle bad inputs and such, but that will work otherwise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With