Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF8 byte[] to string conversion

I have UTF8 byte[] of infinite size (i.e. of very large size). I want to truncate it to 1024 bytes only and then convert it to string.

Encoding.UTF8.GetString(byte[], int, int) does that for me. It first shortens 1024 bytes and then gives me its converted string.

But in this conversion, if last character is of UTF8 character set, which is made of 2 bytes and whose first byte falls in range and another byte is out of range then it displays ? for that character in converted string.

Is there any way so that this ? does not come in converted string?

like image 601
pratik03 Avatar asked Apr 20 '16 09:04

pratik03


People also ask

How do you convert a byte array into a string?

There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.

How do you convert bytes to strings?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java. lang package.

How do you convert bytes to UTF-8?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

How do I encode an array of strings?

To encode string array values, use the numpy. char. encode() method in Python Numpy. The arr is the input array to be encoded.


1 Answers

That's what the Decoder class is for. It allows you to stream byte data into char data, while maintaining enough state to handle partial code-points correctly:

Encoding.UTF8.GetDecoder().GetChars(buffer, 0, 1024, charBuffer, 0)

Of course, when the code-point is split in the middle, the Decoder is left with a "partial char" in its state, but that doesn't concern you in your case (and is desirable in all the other use cases :)).

like image 197
Luaan Avatar answered Sep 28 '22 00:09

Luaan