Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between charsets and character encoding

What is the difference between charsets and character encoding? When i say i am using utf-8 encoding then what will be my charset? Does it take unicode as charset by default?

like image 467
Neeraj Avatar asked Mar 18 '10 11:03

Neeraj


People also ask

What is the difference between character set and encoding?

Characters in a character set are stored as one or more bytes in a computer. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.

What is the meaning of character encoding?

What is Character Encoding? Character encoding tells computers how to interpret digital data into letters, numbers and symbols. This is done by assigning a specific numeric value to a letter, number or symbol. These letters, numbers, and symbols are classified as “characters”.

Is UTF-8 character set or encoding?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.


2 Answers

UTF-8 is an encoding of the Unicode character set. Therefore if you're using UTF-8, the character set is Unicode, but you're not likely to have to specify this separately anywhere. The other main encoding of Unicode is UTF-16, which is not put into 8-bit byte streams because it contains zero bytes. If you are dealing with Unicode in a byte sequence, it is certainly encoded as UTF-8.

Other than Unicode, character sets are usually considered to have a single fixed encoding, and then terms like character set, charset, codepage, encoding are often used interchangeably, or depending on the vendor. This is sloppy but creates no runtime problems.

The only possible exceptions I can think of are East Asian: JIS and EUC originally defined multiple encodings for the same character set, but in practice today, each encoding is just treated separately.

like image 52
Joseph Boyle Avatar answered Oct 05 '22 02:10

Joseph Boyle


Character set: definition which character has which numeric code point (ascii, jis, unicode)

Encoding: definition how the numeric code point is physically represented (utf, ucs, shiftjis)

like image 39
devio Avatar answered Oct 05 '22 01:10

devio