Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple way to append a byte to a StringBuffer and specify the encoding?

Question

What is the simplest way to append a byte to a StringBuffer (i.e. cast a byte to a char) and specify the character encoding used (ASCII, UTF-8, etc)?

Context

I want to append a byte to a stringbuffer. Doing so requires casting the byte to a char:

myStringBuffer.append((char)nextByte);

However, the code above uses the default character encoding for my machine (which is MacRoman). Meanwhile, other components in the system/network require UTF-8. So I need to so something like:

try {
    myStringBuffer.append(new String(new Byte[]{nextByte}, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    //handle error
}

Which, frankly, is pretty ugly.

Surely, there's a better way (other than breaking the same code into multiple lines)???????

like image 215
gMale Avatar asked Apr 21 '11 01:04

gMale


People also ask

Can we append string and StringBuffer?

Use append(String str) API method of StringBuffer. This method appends the specified string to this character sequence. The method can also be used to append a boolean , a char , a char array , a double , a float , an int and an Object .

What is append in StringBuffer?

append(char a) : This is an inbuilt method that appends the string representation of the char argument to the given sequence. The char argument is appended to the contents of this StringBuffer sequence. Syntax : public StringBuffer append(char a)

Can StringBuffer be modified?

I know final variable characteristics.It can't be changed from its initialized state. But When using the final variable in StringBuffer class then the Object of the StringBuffer class will get modified using append() method in java.

How do you assign a value to a StringBuffer?

StringBuffer sb = new StringBuffer("word"); sb. setLength(0); // setting its length to 0 for making the object empty sb. append("text"); This is how you can change the entire value of StringBuffer.


2 Answers

I think the error here is in dealing with bytes at all. You want to deal with strings of characters instead.

Just interpose a reader on the input and output stream to do the mapping between bytes and characters for you. Use the InputStreamReader(InputStream in, CharsetDecoder dec) form of the constructor for the input, though, so that you can detect input encoding errors via an exception. Now you have strings of characters instead of buffers of bytes. Put an OutputStreamWriter on the other end.

Now you no longer have to worry about bytes or encodings. It’s much simpler this way.

like image 182
tchrist Avatar answered Oct 20 '22 19:10

tchrist


The simple answer is 'no'. What if the byte is the first byte of a multi-byte sequence? Nothing would maintain the state.

If you have all the bytes of a logical character in hand, you can do:

sb.append(new String(bytes, charset));

But if you have one byte of UTF-8, you can't do this at all with stock classes.

It would not be terribly difficult to build a juiced-up StringBuffer that uses java.nio.charset classes to implement byte appending, but it would not be one or two lines of code.

Comments indicate that there's some basic Unicode knowledge needed here.

In UTF-8, 'a' is one byte, 'á' is two bytes, '丧' is three bytes, and '𝌎' is four bytes. The job of CharsetDecoder is to convert these sequences into Unicode characters. Viewed as a sequential operation over bytes, this is obviously a stateful process.

If you create a CharsetDecoder for UTF-8, you can feed it only byte at a time (in a ByteBuffer) via this method. The UTF-16 characters will accumulate in the output CharBuffer.

like image 20
bmargulies Avatar answered Oct 20 '22 20:10

bmargulies