Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xml Escaping/Encoding terminology

Tags:

I'm confused as for the difference between the terms "escaping" and "encoding" in phrases like:

Xml Encoding

Xml Escaping

Encoded Html

Escaped Url

...

Can anyone explain it to me?

like image 921
Yaron Naveh Avatar asked Apr 18 '09 11:04

Yaron Naveh


People also ask

What is XML escaping?

Escapes or unescapes an XML file removing traces of offending characters that could be wrongfully interpreted as markup.

What is encoding and escaping?

Encoding is transforming data from one format into another format. Escaping is a subset of encoding, where not all characters need to be encoded. Only some characters are encoded (by using an escape character).

How do you escape and in XML?

xml version="1.0"?> An ampersand a character reference can also be escaped as & in element content of XML.

How do I escape a character in an XML string?

XML escape characters There are only five: " &quot; ' &apos; < &lt; > &gt; & &amp; Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.


2 Answers

Encoding describes how the file's characters are physically written in binary (as in Unicode or ANSI).

Escaping refers to the process of replacing special characters (such as < and >) with their XML entity equivalent (such as &lt; and &gt;). For URLs, escaping refers to replacing characters with strings starting with %, such as %20 for a single whitespace.

Escaping differs by language, but encodings are usually widely-accepted standards. Sometimes the terms are used ambiguously (particularly with encoding used to mean escaping), but they are well defined and distinct.

like image 67
Welbog Avatar answered Oct 01 '22 14:10

Welbog


In every Web Application, data consists of various layers like the View Layer, Model Layer, Database Layer, etc. Each layer is "supposed" to be developed independently to satisfy various scalability and maintainability requirements.

Now, basically, every layer needs to "talk" to every other, and they have to decide upon a language through which they can talk. This is called encoding. Various types of encodings exists like ASCII, UTF-8, UTF-16, etc. Now if the user is Chinese or Japanese, for instance, then for him ASCII wouldn't work, hence he would go ahead with UTF-16 or any other encoding technique which would guarantee communication in Chinese. So from the web layer, Chinese characters would pass through the business layer, and then to the data layer, and everywhere, the same "encoding" scheme is to be used.

Why ?

Now suppose , your Web Layer , sends data in UTF-16 , supporting chinese language , but the database layer accepts , only ASCII , then the database layer would get confused as to what are you talking ! it understands only English characters , it won't understanding the rest. This was about Encoding.

Escaping :

There is a certain set of data called "metadata" which have a special meaning from the browsers perspective. For example , <> are metadata from the browsers perspective. The browsers parser knows that all the data contained inside these <> are to be interpreted. Now the attackers use this technique to confuse the browsers. For Example :

<input type="text" value="${name} />

if i replace the name with

name="/><script>alert(document.cookie)</script>

Then the resultant code as the browser sees it will be

<input type="text" value=""/><script>alert(document.cookie)</script> />

Means, now you need to instruct the browser that whatever I put in the name="" should be "escaped" , or should be considered as data only. So there are various functions which either encode/escape <> as their html equivalent %3C%3E, so now the browser knows that this needs to be treated differently. Basically escaping means to escape their actual meaning (roughly speaking).

 <input type="text" value="${fn:escapeXML(name)} />

using JSTL.

like image 36
Rohit Salecha Avatar answered Oct 01 '22 15:10

Rohit Salecha