Is there a simple way to append a byte to a StringBuffer and specify the encoding?

Question

What is the simplest way to append a byte to a StringBuffer (i.e. cast a byte to a char) and specify the character encoding used (ASCII, UTF-8, etc)?

Context

I want to append a byte to a stringbuffer. Doing so requires casting the byte to a char:

myStringBuffer.append((char)nextByte);

However, the code above uses the default character encoding for my machine (which is MacRoman). Meanwhile, other components in the system/network require UTF-8. So I need to so something like:

try {
    myStringBuffer.append(new String(new Byte[]{nextByte}, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    //handle error
}

Which, frankly, is pretty ugly.

Surely, there's a better way (other than breaking the same code into multiple lines)???????

215

asked Apr 21 '11 01:04

gMale

2 Answers

I think the error here is in dealing with bytes at all. You want to deal with strings of characters instead.

Just interpose a reader on the input and output stream to do the mapping between bytes and characters for you. Use the InputStreamReader(InputStream in, CharsetDecoder dec) form of the constructor for the input, though, so that you can detect input encoding errors via an exception. Now you have strings of characters instead of buffers of bytes. Put an OutputStreamWriter on the other end.

Now you no longer have to worry about bytes or encodings. It’s much simpler this way.

182

answered Oct 20 '22 19:10

tchrist

The simple answer is 'no'. What if the byte is the first byte of a multi-byte sequence? Nothing would maintain the state.

If you have all the bytes of a logical character in hand, you can do:

sb.append(new String(bytes, charset));

But if you have one byte of UTF-8, you can't do this at all with stock classes.

It would not be terribly difficult to build a juiced-up StringBuffer that uses java.nio.charset classes to implement byte appending, but it would not be one or two lines of code.

Comments indicate that there's some basic Unicode knowledge needed here.

In UTF-8, 'a' is one byte, 'á' is two bytes, '丧' is three bytes, and '𝌎' is four bytes. The job of CharsetDecoder is to convert these sequences into Unicode characters. Viewed as a sequential operation over bytes, this is obviously a stateful process.

If you create a CharsetDecoder for UTF-8, you can feed it only byte at a time (in a ByteBuffer) via this method. The UTF-16 characters will accumulate in the output CharBuffer.

answered Oct 20 '22 20:10

bmargulies

Related questions
                            
                                Get real file extension -Java code
                            
                                Differerence between hibernate types: boolean, yes_no, true_false
                            
                                how to get the min and max heap size settings of a JVM from within a Java program
                            
                                Coding a parser for a domain specific language in Java
                            
                                Fastest way for inserting very large number of records into a Table in SQL
                            
                                Twitter api - no more than 150 requests per hour
                            
                                Null check error message as "is null" or "was null"
                            
                                JPA - EclipseLink - How to change default schema
                            
                                How to turn Spring @Autowired required property to false for test?
                            
                                What are the best resources to learn Ant? [closed]
                            
                                Java: JProgressBar (or equivalent) in a JTabbedPane tab title
                            
                                System.out with Ant
                            
                                nginx: Is it possible to capture response headers in access log when using nginx as a reverse proxy?
                            
                                How to use two class with the same name in different packages? [duplicate]
                            
                                The final field cannot be assigned, for an interface
                            
                                what's the use of a frame , a pane or a panel in swing?
                            
                                How to efficiently convert byte array to string
                            
                                Constant Object vs Immutable Object
                            
                                How enable JSONP in RESTEasy?
                            
                                Sending redirect to another servlet/JSP without loosing the request parameters.

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a simple way to append a byte to a StringBuffer and specify the encoding?

Tags:

java

char

character-encoding

utf-8

byte