Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java and .NET: Base64 conversion confusion

Tags:

java

.net

base64

I'm having trouble converting text to Base64 string in Java (Android) and .NET (Visual Basic). The plain (readable) form of ASCII characters convert fine. But when it comes to special characters (characters whose code is greater than 128), they're creating trouble for me.

For example I try converting a character code whose ASCII value is 65 (the character "A").

My Java code is:

char a = 65;
String c = String.valueOf(a); 
byte bt[] = c.getBytes();               
String result = Base64.encodeToString(bt, Base64.DEFAULT);

And my .NET code is:

Dim c As String = Chr(65)
Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c))

These both return the same result: "QQ==". This is fine. But when I try converting a special character, for example a character code 153. Then it returns different results.

char a = 153;
String c = String.valueOf(a);               
byte bt[] = c.getBytes();               
String result = Base64.encodeToString(bt, Base64.DEFAULT);

This returns "wpk="

And my same .NET code:

Dim c As String = Chr(153) 
Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c))

This returns "4oSi"

This is so strange. What's wrong here. I'm using the native Base64 libraries on both platforms. Is something wrong with my code?

like image 557
Faraz Azhar Avatar asked Oct 22 '12 18:10

Faraz Azhar


1 Answers

Since the data that you are encoding is encrypted data - random data where any byte can be from 0 to 255 and, in its encrypted state, has no character or text meaning, you need to treat this information as -lets call it - true binary data. Both Java and .NET have full support for true binary data via their respective byte array primitives.

As you know, base64 encoding is the process of converting true binary data (with a range of 0 to 255) into a slightly larger array of binary data (where each byte is guaranteed to have the same value as an ASCII printable character somewhere between 32 and 126). Let's call this encoded binary. The encoded binary can then safely be converted to text because virtually every known character set agrees on the printable ASCII character set (32 to 126).

So the main problem with both the Java and VB.NET snippets is that you are attempting to use text primitives - char and String in Java; String in VB.NET to store true binary data. Once you do that it's too late. There is no way to reliably convert it back to byte arrays because the text primitives are simply not designed to safely store and retrieve binary data. For more on why this is so, please read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Fortunately the fix is simple. For Java, don't use char and String to store binary data. Put the data directly into a byte array. Try the following:

  byte [] bt = new byte[1];
  bt[0] = (byte) 153;
  String result = Base64.encodeToString(bt, Base64.DEFAULT);

I get mQ==

The fix is conceptually the same in VB.NET. Don't use String. Use a byte array.

    Dim bytes() As Byte = New Byte() {153}
    Dim result As String = Convert.ToBase64String(bytes)

Again - the answer is mQ==

Finally, after the encoding, it's perfectly fine to use Strings. Your characters are in the ASCII subset and any conversion between String and byte array will not corrupt data because all character sets agree on the ASCII subset.

Remember you will have the same issue going in the reverse order - decoding. You will be decoding to a byte array, at which point you will be back to true binary. From this point on the data must never be stored as a string - until you are finished with it - ex. decrypting it back to the original clear text.

Hope this helps.

like image 106
Guido Simone Avatar answered Sep 28 '22 17:09

Guido Simone