I'm confused as for the difference between the terms "escaping" and "encoding" in phrases like:
Xml Encoding
Xml Escaping
Encoded Html
Escaped Url
...
Can anyone explain it to me?
Escapes or unescapes an XML file removing traces of offending characters that could be wrongfully interpreted as markup.
Encoding is transforming data from one format into another format. Escaping is a subset of encoding, where not all characters need to be encoded. Only some characters are encoded (by using an escape character).
xml version="1.0"?> An ampersand a character reference can also be escaped as & in element content of XML.
XML escape characters There are only five: " " ' ' < < > > & & Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.
Encoding describes how the file's characters are physically written in binary (as in Unicode or ANSI).
Escaping refers to the process of replacing special characters (such as <
and >
) with their XML entity equivalent (such as <
and >
). For URLs, escaping refers to replacing characters with strings starting with %
, such as %20
for a single whitespace.
Escaping differs by language, but encodings are usually widely-accepted standards. Sometimes the terms are used ambiguously (particularly with encoding used to mean escaping), but they are well defined and distinct.
In every Web Application, data consists of various layers like the View Layer, Model Layer, Database Layer, etc. Each layer is "supposed" to be developed independently to satisfy various scalability and maintainability requirements.
Now, basically, every layer needs to "talk" to every other, and they have to decide upon a language through which they can talk. This is called encoding. Various types of encodings exists like ASCII, UTF-8, UTF-16, etc. Now if the user is Chinese or Japanese, for instance, then for him ASCII wouldn't work, hence he would go ahead with UTF-16 or any other encoding technique which would guarantee communication in Chinese. So from the web layer, Chinese characters would pass through the business layer, and then to the data layer, and everywhere, the same "encoding" scheme is to be used.
Why ?
Now suppose , your Web Layer , sends data in UTF-16 , supporting chinese language , but the database layer accepts , only ASCII , then the database layer would get confused as to what are you talking ! it understands only English characters , it won't understanding the rest. This was about Encoding.
Escaping :
There is a certain set of data called "metadata" which have a special meaning from the browsers perspective. For example , <>
are metadata from the browsers perspective. The browsers parser knows that all the data contained inside these <>
are to be interpreted.
Now the attackers use this technique to confuse the browsers.
For Example :
<input type="text" value="${name} />
if i replace the name with
name="/><script>alert(document.cookie)</script>
Then the resultant code as the browser sees it will be
<input type="text" value=""/><script>alert(document.cookie)</script> />
Means, now you need to instruct the browser that whatever I put in the name=""
should be "escaped" , or should be considered as data only. So there are various functions which either encode/escape <>
as their html equivalent %3C%3E
, so now the browser knows that this needs to be treated differently. Basically escaping means to escape their actual meaning (roughly speaking).
<input type="text" value="${fn:escapeXML(name)} />
using JSTL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With