Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between UTF8/UTF16 and Base64 in terms of encoding

In. c#

We can use below classes to do encoding:

  • System.Text.Encoding.UTF8
  • System.Text.Encoding.UTF16
  • System.Text.Encoding.ASCII

Why there is no System.Text.Encoding.Base64?

We can only use Convert.From(To)Base64String method, what's special of base64?

Can I say base64 is the same encoding method as UTF-8? Or UTF-8 is one of base64?

like image 945
Zhongmin Avatar asked Oct 05 '10 17:10

Zhongmin


People also ask

What is the difference between UTF-8 and UTF-16?

Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: Basic ASCII characters like digits, Latin characters with no accents, etc.

Should I use UTF-8 or UTF-16?

If your data is mostly in western languages and you want to reduce the amount of storage needed, go with UTF-8 as for those languages it will take about half the storage of UTF-16.

What is UTF-16 encoding?

UTF-16 is an encoding of Unicode in which each character is composed of either one or two 16-bit elements. Unicode was originally designed as a pure 16-bit encoding, aimed at representing all modern scripts.

What encoding is Base64?

The base64 is a binary to a text encoding scheme that represents binary data in an ASCII string format. base64 is designed to carry data stored in binary format across the channels. It takes any form of data and transforms it into a long string of plain text.


Video Answer


2 Answers

UTF-8 and UTF-16 are methods to encode Unicode strings to byte sequences.

See: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Base64 is a method to encode a byte sequence to a string.

So, these are widely different concepts and should not be confused.

Things to keep in mind:

  • Not every byte sequence represents an Unicode string encoded in UTF-8 or UTF-16.

  • Not every Unicode string represents a byte sequence encoded in Base64.

like image 121
dtb Avatar answered Oct 07 '22 21:10

dtb


Base64 is a way to encode binary data, while UTF8 and UTF16 are ways to encode Unicode text. Note that in a language like Python 2.x, where binary data and strings are mixed, you can encode strings into base64 or utf8 the same way:

u'abc'.encode('utf16') u'abc'.encode('base64') 

But in languages where there's a more well-defined separation between the two types of data, the two ways of representing data generally have quite different utilities, to keep the concerns separate.

like image 31
Mike Axiak Avatar answered Oct 07 '22 22:10

Mike Axiak