Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create StringBuilder from byte[]

Tags:

java

memory

Is there a way to create a StringBuilder from a byte[]?

I want to improve memory usage using StringBuilder but what I have first is a byte[], so I have to create a String from the byte[] and then create the StringBuilder from the String and I don't see this solution as optimal.

Thanks

like image 308
manash Avatar asked Jun 20 '12 07:06

manash


2 Answers

Basically, your best option seems to be using CharsetDecoder directly.

Here's how:

byte[] srcBytes = getYourSrcBytes();

//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();

//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);

//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);

ADDED:

After some tests it seems that the simple new String(bytes) is much faster and it seems there is no simple way to make it faster than that. Here is the test I ran:

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;

public class ConsoleMain {
    public static void main(String[] args) throws IOException, ParseException {
        StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
        for (int i=0;i<19;i++) {
            sb1.append(sb1);
        }
        System.out.println("Size of buffer: "+sb1.length());
        byte[] src = sb1.toString().getBytes("UTF-8");
        StringBuilder res;

        long startTime = System.currentTimeMillis();
        res = testStringConvert(src);
        System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }

        startTime = System.currentTimeMillis();
        res = testCBConvert(src);
        System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }
    }

    private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
        String s = new String(src, "UTF-8");
        StringBuilder b = new StringBuilder(s);
        return b;
    }

    private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
        Charset charset = Charset.forName("UTF-8");
        CharsetDecoder decoder = charset.newDecoder();
        ByteBuffer srcBuffer = ByteBuffer.wrap(src);
        CharBuffer resBuffer = decoder.decode(srcBuffer);
        StringBuilder b = new StringBuilder(resBuffer);
        return b;
    }
}

Results:

Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252

And a modified (less memory-consuming) version on IDEONE: Here.

like image 161
bezmax Avatar answered Oct 15 '22 19:10

bezmax


If it is short statements you want, then there is no way around the String step in between. The String constructor mixes conversion and object construction for convenience in a very common case, but there is no such convenience constructor for a StringBuilder.

If it is performance you are interested in, then you might avoid the intermediate String object by using something like this:

new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))

If you want to be able to fine-tune performance, you can control the decode process yourself. For example, you might want to avoid using too much memory, by using averageCharsPerByte as an estimate of how much memory will be needed. Instead of resizing the buffer if that estimate was too short, you could use the resulting StringBuilder to accumulate all the parts.

CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
    cr = cd.decode(inBuf, outBuf, true);
    out.append(outBuf);
    outBuf.clear();
    if (cr.isUnderflow()) break;
    if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);

I doubt that the above code will be worth the effort in most applications, though. If an application is that interested in performance, it probably shouldn't be dealing with StringBuilder either, but handle everything at the buffer level.

like image 22
MvG Avatar answered Oct 15 '22 17:10

MvG