Is there a way to create a StringBuilder
from a byte[]
?
I want to improve memory usage using StringBuilder
but what I have first is a byte[]
, so I have to create a String
from the byte[]
and then create the StringBuilder
from the String
and I don't see this solution as optimal.
Thanks
Basically, your best option seems to be using CharsetDecoder directly.
Here's how:
byte[] srcBytes = getYourSrcBytes();
//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);
//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);
ADDED:
After some tests it seems that the simple new String(bytes)
is much faster and it seems there is no simple way to make it faster than that. Here is the test I ran:
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;
public class ConsoleMain {
public static void main(String[] args) throws IOException, ParseException {
StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
for (int i=0;i<19;i++) {
sb1.append(sb1);
}
System.out.println("Size of buffer: "+sb1.length());
byte[] src = sb1.toString().getBytes("UTF-8");
StringBuilder res;
long startTime = System.currentTimeMillis();
res = testStringConvert(src);
System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
if (!res.toString().equals(sb1.toString())) {
System.err.println("Conversion error");
}
startTime = System.currentTimeMillis();
res = testCBConvert(src);
System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
if (!res.toString().equals(sb1.toString())) {
System.err.println("Conversion error");
}
}
private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
String s = new String(src, "UTF-8");
StringBuilder b = new StringBuilder(s);
return b;
}
private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer srcBuffer = ByteBuffer.wrap(src);
CharBuffer resBuffer = decoder.decode(srcBuffer);
StringBuilder b = new StringBuilder(resBuffer);
return b;
}
}
Results:
Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252
And a modified (less memory-consuming) version on IDEONE: Here.
If it is short statements you want, then there is no way around the String step in between. The String constructor mixes conversion and object construction for convenience in a very common case, but there is no such convenience constructor for a StringBuilder.
If it is performance you are interested in, then you might avoid the intermediate String object by using something like this:
new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))
If you want to be able to fine-tune performance, you can control the decode process yourself. For example, you might want to avoid using too much memory, by using averageCharsPerByte as an estimate of how much memory will be needed. Instead of resizing the buffer if that estimate was too short, you could use the resulting StringBuilder to accumulate all the parts.
CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
cr = cd.decode(inBuf, outBuf, true);
out.append(outBuf);
outBuf.clear();
if (cr.isUnderflow()) break;
if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);
I doubt that the above code will be worth the effort in most applications, though. If an application is that interested in performance, it probably shouldn't be dealing with StringBuilder either, but handle everything at the buffer level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With