Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect end of string in byte array to string conversion?

I receive from socket a string in a byte array which look like :

[128,5,6,3,45,0,0,0,0,0]

The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.

If i simply do :

String myString = new String(myBuffer); 

I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).

To get the correct size and the correct string i do this :

int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
    byte charac = datasRec[j];
    if(charac == 0)
        break;
    sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label    = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
    label[j] = datasRec[j];
}
String myString = new String(label);

Is there a better way to handle the problem ?

Thanks

like image 224
grunk Avatar asked Nov 04 '11 09:11

grunk


2 Answers

May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.

like image 119
Yuvi Avatar answered Nov 18 '22 10:11

Yuvi


0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.

If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:

int size = 0;
while (size < data.length)
{
    if (data[size] == 0)
    {
        break;
    }
    size++;
}

// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");

I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.

If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.

like image 7
Jon Skeet Avatar answered Nov 18 '22 10:11

Jon Skeet