Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct a string based on UTF8?

Tags:

java

unicode

I think I can use \u**** to construct a character based on UTF16, how to construct a string using UTF8?

like image 645
Adam Lee Avatar asked Dec 15 '22 23:12

Adam Lee


1 Answers

Strings in Java are encoding-agnostic (they use UTF-16 internally, but that doesn't matter here). The codes you are entering after \u are Unicde code points, they are not the actual binary representation of characters. Each character has an associated code point. Different encodings define how you map code points to given binary represantation.

That being said you construct string using code points and then convert it to arbitrary encoding using getBytes() method. For example Euro sign ():

"\u20AC".getBytes("UTF-8");   //-30,  -126, -84
"\u20AC".getBytes("UTF-16");  //-2, -1, 32, -84
"\u20AC".getBytes("UTF-32");  // 0,  0, 32, -84

Worth to remember: UTF-16 isn't really using 16 bits all the time!

like image 148
Tomasz Nurkiewicz Avatar answered Dec 30 '22 16:12

Tomasz Nurkiewicz