Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a byte array's size not equal a strings size?

I'm trying to understand how an array of bytes size is smaller than a strings. I know each character of a string is like 2 bytes or something. But even that math is not adding up. Can someone shed some light for me please?

The following:

byte[] myBytes = Encoding.ASCII.GetBytes("12345");
string myString = Convert.ToBase64String(myBytes);
Debug.WriteLine("Size of byte array: " + myBytes.Length);
Debug.WriteLine("Size of string: " + myString.Length);

Returns:

Size of byte array: 5

Size of string: 8

like image 352
Arvo Bowen Avatar asked Dec 08 '22 20:12

Arvo Bowen


1 Answers

The sizes/lengths do match, but only if you use a 1:1 encoding.

First, you seem to be a bit confused as to what encoding is. Remember that bytes are just numbers (ranged 0-127) and are the only thing storable by a computer. Those numbers don't mean anything to humans other than numeric value. Because we wanted to be able to store the idea of text, we had to come up with a way to map these numbers to readable (and some not so readable) characters. These methods are called encodings.

You encoded your bytes with Base64 encoding, which has overhead (approximately 1 extra byte per 3 bytes of input according to Base64 length calculation?). That overhead is causing your difference.

If you used Encoding.ASCII instead:

byte[] myBytes = Encoding.ASCII.GetBytes("12345");
string myString = Encoding.ASCII.GetString(myBytes);
Console.WriteLine("Size of byte array: " + myBytes.Length);
Console.WriteLine("Size of string: " + myString.Length);

You get as expected:

Size of byte array: 5

Size of string: 5

The reason to use Base64 (even with overhead) is that it can encode any byte array into printable characters (which is required when trying to send them say, via a URL), whereas ASCII encoding will result in unprintable characters for quite a few values.

Also note that a character is only two bytes in a UTF-16 encoding, which is why your number isn't double like you mentioned in the question.

like image 187
BradleyDotNET Avatar answered Dec 20 '22 21:12

BradleyDotNET