I have a pretty interesting topic - at least for me. Given a ByteArrayOutputStream with bytes for example in UTF-8, I need a function that can "translate" those bytes into another - new - ByteArrayOutputStream in for example UTF-16, or ASCII or you name it. My naive approach would have been to use a an InputStreamReader and give in the the desired encoding, but that didn't work because that'll read into a char[] and I can only write byte[] to the new BAOS.
public byte[] convertStream(Charset encoding) {
ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
InputStreamReader contentReader = new InputStreamReader(original, encoding);
ByteArrayOutputStream converted = new ByteArrayOutputStream();
int readCount;
char[] buffer = new char[4096];
while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1)
converted.write(buffer, 0, readCount);
return converted.toByteArray();
}
Now, this obviously doesn't work and I'm looking for a way to make this scenario possible, without building a String out of the byte[].
@Edit: Since it seems rather hard to read the obvious things. 1) raw: ByteArrayOutputStream containing bytes of a BINARY object sent to us from clients. The bytes usually come in UTF-8 as a part of a HTTP Message. 2) The goal here is to send this BINARY data forward to an internal System that's not flexible - well this is an internal System - and it accepts such attachments in UTF-16. I don't know why don't even ask, it does so.
So to justify my question: Is there a way to convert a byte array from Charset A to Charset B or encoding of your choise. Once again Building a String is NOT what I'm after.
Thank you and hope that clears up questionable parts :).
The '8' signifies that it allocates 8-bit blocks to denote a character. The number of blocks needed to represent a character varies from 1 to 4. In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array.
Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char , which translates to the positive number 65480 .
As mentioned in comments, I'd just convert to a string:
String text = new String(raw.toByteArray(), encoding);
byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);
However, if that's not feasible (for some unspecified reason...) what you've got now is nearly there - you just need to add an OutputStreamWriter
into the mix:
// Nothing here should throw IOException in reality - work out what you want to do.
public byte[] convertStream(Charset encoding) throws IOException {
ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
InputStreamReader contentReader = new InputStreamReader(original, encoding);
int readCount;
char[] buffer = new char[4096];
try (ByteArrayOutputStream converted = new ByteArrayOutputStream()) {
try (Writer writer = new OutputStreamWriter(converted, StandardCharsets.UTF_8)) {
while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1) {
writer.write(buffer, 0, readCount);
}
}
return converted.toByteArray();
}
}
Note that you're still creating an extra temporary copy of the data in memory, admittedly in UTF-8 rather than UTF-16... but fundamentally this is hardly any more efficient than creating a string.
If memory efficiency is a particular concern, you could perform multiple passes in order to work out how many bytes will be required, create a byte array of the write length, and then adjust the code to write straight into that byte array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With