I have a question, which Unicode encoding to use while encoding .NET string into base64? I know strings are UTF-16 encoded on Windows, so is my way of encoding is the right one?
public static String ToBase64String(this String source) { return Convert.ToBase64String(Encoding.Unicode.GetBytes(source)); }
To convert a string into a Base64 character the following steps should be followed: Get the ASCII value of each character in the string. Compute the 8-bit binary equivalent of the ASCII values. Convert the 8-bit characters chunk into chunks of 6 bits by re-grouping the digits.
Base64 is a group of similar binary-to-text encoding schemes representing binary data in an ASCII string format by translating it into a radix-64 representation. Each Base64 digit represents exactly 6-bits of data that means 4 6-bit Base64 digits can represent 3 bytes.
ToBase64String(Byte[]) Converts an array of 8-bit unsigned integers to its equivalent string representation that is encoded with base-64 digits. ToBase64String(Byte[], Base64FormattingOptions) Converts an array of 8-bit unsigned integers to its equivalent string representation that is encoded with base-64 digits.
Q Why does an = get appended at the end? A: As a short answer: The last character ( = sign) is added only as a complement (padding) in the final process of encoding a message with a special number of characters.
What you've provided is perfectly functional. It will produce a base64-encoded string of the bytes of your source string encoded in UTF-16.
If you're asking if UTF-16 can represent any character in your string, then yes. The only difference between UTF-16 and UTF-32 is that UTF-16 is a variable-length encoding; it uses two-bytes to represent characters within a subset, and four-bytes for all other characters.
There are no unicode characters that cannot be represented by UTF-16.
Be aware that you don't have to use UTF-16 just because that's what .NET strings use. When you create that byte array, you're free to choose any encoding that will handle all the characters in your string. For example, UTF-8 would be more efficient if the text is in a Latin-based language, but it can still handle every known character.
The most important concern is that whatever software decodes the base64 string, needs to know which encoding to apply to the byte array to re-create the original string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With