Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: String.getBytes(Charset) Vs. Charset.encode(String) for use with OutputStream

I am in the situation that my algorithm has the 2 inputs of:

  • 1 utf8 String object that will be encoded
  • 1 Charset object which indicates what I need to encode the string into

In the end, the returned result will be put into an OutputStream, an action which may happen multiple times, but at least once. There is no multithreading happening in this scenario.

I have found two solutions:

  1. Calling getBytes(Charset) on the given String and supply the given Charset. This will return a byte[]
  2. Calling encode(String) on the given Charset and supply the given String. This will return a ByteBuffer.

Delving into the code behind these methods shows a complex design for each underlying algorithm. I can't say I understand how to make a choice between these two options.

  1. Is there a significant performance difference for calling either method?
  2. Is there a significant performance difference for putting the result into the OutputStream?
  3. Is there a significant difference in footprint?

Which solution would generally be a better choice?

like image 699
dammkewl Avatar asked Mar 06 '26 00:03

dammkewl


1 Answers

In both cases, a byte[] is built dynamically to encode the string. A more efficient approach is to have it written directly to the OutputStream. e.g.

OutputStreamWriter osw = new OutputStreamWriter(out, StandardCharsets.UTF_8);
// look Mum, no byte[] needed
osw.write(text);

An alternetive would to use DataOutputStream.writeUTF if you need a binary format.

like image 131
Peter Lawrey Avatar answered Mar 07 '26 15:03

Peter Lawrey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!