Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get UTF-16 byte array?

Tags:

c#

encoding

I have an UTF-8 string and I need to get the byte array of UTF-16 encoding, so how can I convert my string to UTF-16 byte array?

Update:
I mean we have Encoding.Unicode.GetBytes() or even Encoding.UTF8.GetBytes() function to get byte array of strings, what about UTF-16? We don't have any Encoding.UTF16.GetBytes() so how can I get the byte array?

like image 702
Afshin Mehrabani Avatar asked Sep 09 '13 12:09

Afshin Mehrabani


People also ask

What is UTF-16 format?

UTF-16 (16- bit Unicode Transformation Format) is a standard method of encoding Unicode character data. Part of the Unicode Standard version 3.0 (and higher-numbered versions), UTF-16 has the capacity to encode all currently defined Unicode characters.

How many bytes does a UTF-16 needs to represent characters?

UTF-16 is based on 16-bit code units. Each character is encoded as at least 2 bytes. Some characters that are encoded with a 1-byte code unit in UTF-8 are encoded with a 2-byte code unit in UTF-16. Characters that are surrogate or supplementary characters use 4 bytes and thus require additional storage.

How many bytes is UTF-8?

UTF-8 is a byte encoding used to encode unicode characters. UTF-8 uses 1, 2, 3 or 4 bytes to represent a unicode character. Remember, a unicode character is represented by a unicode code point. Thus, UTF-8 uses 1, 2, 3 or 4 bytes to represent a unicode code point.

How do you get a byte from a string?

Convert byte[] to String (text data) toString() to get the string from the bytes; The bytes. toString() only returns the address of the object in memory, NOT converting byte[] to a string ! The correct way to convert byte[] to string is new String(bytes, StandardCharsets. UTF_8) .


1 Answers

For little-endian UTF-16, use Encoding.Unicode.

For big-endian UTF-16, use Encoding.BigEndianUnicode.

Alternatively, construct an explicit instance of UnicodeEncoding which allows you to specify the endianness, whether or not to include byte-order marks, and whether to throw an exception on invalid data.

like image 118
Jon Skeet Avatar answered Sep 22 '22 05:09

Jon Skeet