What is encoding in XML? The normal encoding used is utf-8. How is it different from other encoding? What is the purpose of using it?

A character encoding specifies how characters are mapped onto bytes. Since XML documents are stored and transferred as byte streams, this is necessary to represent the unicode characters that make up an XML document. UTF-8 is chosen as the default, because it has several advantages: <ul> <li>it is compatible with ASCII in that all valid ASCII encoded text is also valid UTF-8 encoded (but not necessarily the other way around!)</li> <li>it uses only 1 byte per character for "common" letters (those that also exist in ASCII)</li> <li>it can represent all existing Unicode characters</li> </ul> Character encodings are a more general topic than just XML. UTF-8 is not restricted to being used in XML only. What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text is a good article that gives a good overview over the topic.

What is encoding in XML?

2 Answers

A character encoding specifies how characters are mapped onto bytes. Since XML documents are stored and transferred as byte streams, this is necessary to represent the unicode characters that make up an XML document.

UTF-8 is chosen as the default, because it has several advantages:

it is compatible with ASCII in that all valid ASCII encoded text is also valid UTF-8 encoded (but not necessarily the other way around!)
it uses only 1 byte per character for "common" letters (those that also exist in ASCII)
it can represent all existing Unicode characters

Character encodings are a more general topic than just XML. UTF-8 is not restricted to being used in XML only.

What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text is a good article that gives a good overview over the topic.

181

answered Nov 14 '22 21:11

Joachim Sauer

When computers were first created, they mostly only worked with characters found in the english language, leading to the 7-bit US-ASCII standard.

However, there are a lot of different written languages in the world, and ways had to be found to be able to use them in computers.

The first way works fine if you restrict yourself to a certain language, it's to use a culture specific encoding, such as ISO-8859-1, which is able to represent latin-european language characters on 8-bits, or GB2312 for chinese characters.

The second way is a bit more complicated, but allows theoretically to represent every character in the world, it's the Unicode standard, in which every character from every language has a specific code. However, given the high number of existing characters (109,000 in Unicode 5), unicode characters are normally represented using a three byte representation (one byte for the Unicode plane, and two bytes for the character code.

In order to maximize compatibility with existing code (some is still using text in ASCII), the UTF-8 standard encoding was devised as a way to store Unicode characters, only using the minimal amount of space, as described in Joachim Sauer's answer.

So, it's common to see files encoded with specific charsets such as ISO-8859-1 if the file is meant to be edited or read only by software (and people) understanding only these languages, and UTF-8 when there's the need to be highly interoperable and culture-independant. The current tendancy is for UTF-8 to replace other charsets, even though it needs work from software developers, since UTF-8 strings are more complicated to handle than fixed-width charset strings.

answered Nov 14 '22 22:11

SirDarius

Related questions
                            
                                How to send XML data using Axios Library
                            
                                how to make image view with three corner and shadow
                            
                                SharePoint - How do insert new items using the list web service?
                            
                                Get the names of attributes from an element in a SQL XML column
                            
                                Use XSL to generate dynamic XSL that is executed within the same script?
                            
                                Is empty string valid XML?
                            
                                How can I deserialize xml with a default namespace?
                            
                                Parsing google calendar XML feed
                            
                                Why do we use hibernate annotation?
                            
                                XPathNavigator.SetValue Throws NotSupportedException
                            
                                Flex String to XML
                            
                                How to extract the first hit elements from an XML NCBI BLAST file?
                            
                                How to combine multiple uiBinder-based widgets?
                            
                                simple rss feed
                            
                                solr multicore post data
                            
                                skip over nodes with XSLT
                            
                                How to make an "empty" RSS feed
                            
                                Custom Progress Bar in Android? [closed]
                            
                                Processing a large xml file with perl
                            
                                Append elements using Nokogiri::XML::Builder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is encoding in XML?

Tags:

xml

encoding

utf-8

xsd

trilawney

People also ask

2 Answers

Joachim Sauer

SirDarius

Recent Activity

Donate For Us