Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

encoding decoding of byte array to string without data loss

I tried to convert byte[] to string as follows:

Map<String, String> biomap = new HashMap<String, String>();
biomap.put("L1", new String(Lf1, "ISO-8859-1"));

where Lf1 is byte[] array and then i convert this string to byte[]: problem is, when i convert byte array to string it comes like:

FMR  F P�d@� �0d@r (@� ......... etc

and

String SF1 = biomap.get("L1");
byte[] storedL1 = SF1.getBytes("ISO-8859-1")

and when i convert back it to byte array and compare both arrays, it return false. I mean Data Changed.

i want same byte[] data as it was when i encoded to string and decodec to byte[]

like image 687
Kumar Gaurav Sharma Avatar asked Dec 24 '22 17:12

Kumar Gaurav Sharma


2 Answers

First: ISO-8859-1 does not cause any data loss if an arbitrary byte array is converted to string using this encoding. Consider the following program:

public class BytesToString {
    public static void main(String[] args) throws Exception {
        // array that will contain all the possible byte values
        byte[] bytes = new byte[256];
        for (int i = 0; i < 256; i++) {
            bytes[i] = (byte) (i + Byte.MIN_VALUE);
        }

        // converting to string and back to bytes
        String str = new String(bytes, "ISO-8859-1");
        byte[] newBytes = str.getBytes("ISO-8859-1");

        if (newBytes.length != 256) {
            throw new IllegalStateException("Wrong length");
        }
        boolean mismatchFound = false;
        for (int i = 0; i < 256; i++) {
            if (newBytes[i] != bytes[i]) {
                System.out.println("Mismatch: " + bytes[i] + "->" + newBytes[i]);
                mismatchFound = true;
            }
        }
        System.out.println("Whether a mismatch was found: " + mismatchFound);
    }
}

It builds an array of bytes with all possible byte values, then it converts it to String using ISO-8859-1 and then back to bytes using the same encoding.

This program outputs Whether a mismatch was found: false, so bytes->String->bytes conversion via ISO-8859-1 yields the same data as it was in the beginning.

But, as it was pointed out in the comments, String is not a good container for binary data. Specifically, such a string will almost surely contain unprintable characters, so if you print it or try to pass it via HTML or some other means, you will get some problems (data loss, for example).

If you really need to convert byte array to a string (and use it opaquely), use base64 encoding:

String stringRepresentation = Base64.getEncoder().encodeToString(bytes);
byte[] decodedBytes = Base64.getDecoder().decode(stringRepresentation);

It takes more space, but the resulting string is safe in regard to printing.

like image 111
Roman Puchkovskiy Avatar answered Jan 25 '23 22:01

Roman Puchkovskiy


There are special encodings like base64 for encoding binary data for text only systems.

Converting a byte[] to String is only guaranteed to work, if the byte[] contains a valid sequence of bytes according to the chosen encoding. Unknown byte sequences might be replaced with the unicode replacement character (as shown in your example).

like image 45
ooxi Avatar answered Jan 25 '23 23:01

ooxi