Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems converting byte array to string and back to byte array

There are a lot of questions with this topic, the same solution, but this doesn't work for me. I have a simple test with an encryption. The encryption/decryption itself works (as long as I handle this test with the byte array itself and not as Strings). The problem is that don't want to handle it as byte array but as String, but when I encode the byte array to string and back, the resulting byte array differs from the original byte array, so the decryption doesn't work anymore. I tried the following parameters in the corresponding string methods: UTF-8, UTF8, UTF-16, UTF8. None of them work. The resulting byte array differs from the original. Any ideas why this is so?

Encrypter:

public class NewEncrypter {     private String algorithm = "DESede";     private Key key = null;     private Cipher cipher = null;      public NewEncrypter() throws NoSuchAlgorithmException, NoSuchPaddingException     {          key = KeyGenerator.getInstance(algorithm).generateKey();          cipher = Cipher.getInstance(algorithm);     }      public byte[] encrypt(String input) throws Exception     {         cipher.init(Cipher.ENCRYPT_MODE, key);         byte[] inputBytes = input.getBytes("UTF-16");          return cipher.doFinal(inputBytes);     }      public String decrypt(byte[] encryptionBytes) throws Exception     {         cipher.init(Cipher.DECRYPT_MODE, key);         byte[] recoveredBytes = cipher.doFinal(encryptionBytes);         String recovered = new String(recoveredBytes, "UTF-16");          return recovered;     } } 

This is the test where I try it:

public class NewEncrypterTest {     @Test     public void canEncryptAndDecrypt() throws Exception     {         String toEncrypt = "FOOBAR";          NewEncrypter encrypter = new NewEncrypter();          byte[] encryptedByteArray = encrypter.encrypt(toEncrypt);         System.out.println("encryptedByteArray:" + encryptedByteArray);          String decoded = new String(encryptedByteArray, "UTF-16");         System.out.println("decoded:" + decoded);          byte[] encoded = decoded.getBytes("UTF-16");         System.out.println("encoded:" + encoded);          String decryptedText = encrypter.decrypt(encoded); //Exception here         System.out.println("decryptedText:" + decryptedText);          assertEquals(toEncrypt, decryptedText);     } } 
like image 945
Bevor Avatar asked Feb 01 '12 14:02

Bevor


People also ask

Can we convert String to byte array in Java?

The String class provides three overloaded getBytes methods to encode a String into a byte array: getBytes() – encodes using platform's default charset. getBytes (String charsetName) – encodes using the named charset. getBytes (Charset charset) – encodes using the provided charset.

Which of the following ways is correct to convert a byte into long object?

The BigInteger class has a longValue() method to convert a byte array to a long value: long value = new BigInteger(bytes).


1 Answers

It is not a good idea to store encrypted data in Strings because they are for human-readable text, not for arbitrary binary data. For binary data it's best to use byte[].

However, if you must do it you should use an encoding that has a 1-to-1 mapping between bytes and characters, that is, where every byte sequence can be mapped to a unique sequence of characters, and back. One such encoding is ISO-8859-1, that is:

    String decoded = new String(encryptedByteArray, "ISO-8859-1");     System.out.println("decoded:" + decoded);      byte[] encoded = decoded.getBytes("ISO-8859-1");      System.out.println("encoded:" + java.util.Arrays.toString(encoded));      String decryptedText = encrypter.decrypt(encoded); 

Other common encodings that don't lose data are hexadecimal and base64, but sadly you need a helper library for them. The standard API doesn't define classes for them.

With UTF-16 the program would fail for two reasons:

  1. String.getBytes("UTF-16") adds a byte-order-marker character to the output to identify the order of the bytes. You should use UTF-16LE or UTF-16BE for this to not happen.
  2. Not all sequences of bytes can be mapped to characters in UTF-16. First, text encoded in UTF-16 must have an even number of bytes. Second, UTF-16 has a mechanism for encoding unicode characters beyond U+FFFF. This means that e.g. there are sequences of 4 bytes that map to only one unicode character. For this to be possible the first 2 bytes of the 4 don't encode any character in UTF-16.
like image 109
Joni Avatar answered Sep 17 '22 23:09

Joni