What's the difference between encoding and charset?

Character Set

There are characters in each language and collection of those characters form the “character set” of that language. When a character is encoded then it is assigned a unique identifier or a number called as code point. In computer, these code points will be represented by one or more bytes.

Examples of character set: ASCII (covers all English characters), ISO/IEC 646, Unicode (covers characters from all living languages in the world)

Coded Character Set

A coded character set is a set in which a unique number is assigned to each character. That unique number is called as "code point".
Coded character sets are sometimes called code pages.

Encoding

Encoding is the mechanism to map the code points with some bytes so that a character can be read and written uniformly across different system using same encoding scheme.

Examples of encoding: ASCII, Unicode encoding schemes like UTF-8, UTF-16, UTF-32.

Elaboration of above 3 concepts

Consider this - Character 'क' in Devanagari character set has a decimal code point of 2325 which will be represented by two bytes (09 15) when using the UTF-16 encoding
In “ISO-8859-1” encoding scheme “ü” (this is nothing but a character in Latin character set) is represented as hexa-decimal value of FC while in “UTF-8” it represented as C3 BC and in UTF-16 as FE FF 00 FC.
Different encoding schemes may use same code point to represent different characters, for example in “ISO-8859-1” (also called as Latin1) the decimal code point value for the letter ‘é’ is 233. However, in ISO 8859-5, the same code point represents the Cyrillic character ‘щ’.
On the other hand, a single code point in the Unicode character set can actually be mapped to different byte sequences, depending on which encoding was used for the document. The Devanagari character क, with code point 2325 (which is 915 in hexadecimal notation), will be represented by two bytes when using the UTF-16 encoding (09 15), three bytes with UTF-8 (E0 A4 95), or four bytes with UTF-32 (00 00 09 15)

Related questions
                            
                                Should I use encoding declaration in Python 3?
                            
                                How can I safely encode a string in Java to use as a filename?
                            
                                How to get ASCII value of string in C#
                            
                                Base64 encoding in SQL Server 2005 T-SQL
                            
                                Java FileReader encoding issue
                            
                                Why does base64 encoding require padding if the input length is not divisible by 3?
                            
                                How to achieve Base64 URL safe encoding in C#?
                            
                                Determine a string's encoding in C#
                            
                                Is a URL allowed to contain a space?
                            
                                How to support UTF-8 encoding in Eclipse
                            
                                Effective way to find any file's Encoding
                            
                                Why does Python print unicode characters when the default encoding is ASCII?
                            
                                Does C# have an equivalent to JavaScript's encodeURIComponent()?
                            
                                Java : How to determine the correct charset encoding of a stream
                            
                                "â€™" showing on page instead of " ' "
                            
                                Where does this come from: -*- coding: utf-8 -*-
                            
                                In OS X Lion, LANG is not set to UTF-8, how to fix it?
                            
                                Changing default encoding of Python?
                            
                                Difference between encoding and encryption
                            
                                Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between encoding and charset?

Tags:

character-encoding

encoding

People also ask

Character Set

Coded Character Set

Encoding

Elaboration of above 3 concepts

Recent Activity

Donate For Us