After reading this article I don't have a clear answer:
http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/
Will browsers interpret text/html data URI payload in <img>
src
as an document where <script>
tags are executed?
If not then is it safe to allow data URIs in third party HTML?
What safety mechanisms exist at the browser level for this use case?
The MSDN documentation says IE does not:
For security reasons, data URIs are restricted to downloaded resources. Data URIs cannot be used for navigation, for scripting, or to populate frame or iframe elements.
On the other hand, Mozilla does allow iframe and script execution:
data: urls inheriting the origin of their referrer allows them to be used to generate or window content with which the parent can interact. Gecko has always done it this way (and we've got a lot of security checks scattered around that have to worry about it).
Safari and Chromium sandbox data URI execution, effectively treating them as cross domain requests.
We currently mark data: URIs as having no access to any other origins including other data: URIs.
The HTML5 specification states:
If a Document or image was generated from a data: URL that was returned as the location of an HTTP redirect (or equivalent in other protocols)
The origin is the origin of the URL that redirected to the data: URL.
If a Document or image was generated from a data: URL found in another Document or in a script
The origin is an alias to the origin specified by the incumbent settings object when the navigate algorithm was invoked, or, if no script was involved, of the node document of the element that initiated the navigation to that URL.
If a Document or image was obtained in some other manner (e.g. a data: URL typed in by the user, a Document created using the createDocument() API, a data: URL returned as the location of an HTTP redirect, etc)
The origin is a globally unique identifier assigned when the Document or image is created.
And the RFC6454 adds:
A URI is not necessarily same-origin with itself. For example, a data URI [RFC2397] is not same-origin with itself because data URIs do not use a server-based naming authority and therefore have globally unique identifiers as origins.
The CSSHTTPRequest library uses data URIs to do cross-site GET requests, but that is the most it can do across all browsers.
References
HTML Living Standard: Origin
RFC 6454: The Web Origin Concept
It is possible to inject data in this way, but it is important to note that it is also possible to inject data in the binary data of images themselves. Either way nothing is 100% safe. EVER. If you are using the codeigniter framework, you can very solidly protect yourself from this with
$this->security->xss_clean()
Other than that could could build your own version of such a script that just removes dangerous things with regex. Remember to be concerned about different character encodings when building such a script.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With