Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How safe is client-side HTML Sanitization?

I have been looking at Pagedown.js lately for the allure of using mark-down on my pages instead of ugly readonly textareas.

I am extremely cautious though as it seems easy enough to dupe the sanitized converter. I have seen some discussion around Angular.js and it's html bindings and also heard something when Knockout.js 3.0 came out that there had been a previous unsafeness to the html binding.

It would seem all someone would need to do to disable the sanitizer in Pagedown.js for instance is something like -

var safeConverter = new Markdown.Converter();
// safeConverter is open to script injection

safeConverter = Markdown.getSanitizingConverter();
// safeConverter is now safe

// Override the getSanitizingConverter pseudo-code
Markdown.getSanitizingConverter = function () {
    return Markdown.Converter;
};

and they could open a site up to script injection. Is that not true?

Edit

Then why would libraries like that package a sanitizer to use client-side? Sure they say don't render unsanitized html but the next line says use Markdown.Sanitizer..

How is Angular not open to it with the sanitizer service or is that just a farce as well?

like image 908
PW Kad Avatar asked May 23 '14 22:05

PW Kad


People also ask

Should I sanitize HTML?

HTML sanitization is an OWASP-recommended strategy to prevent XSS vulnerabilities in web applications. HTML sanitization offers a security mechanism to remove unsafe (and potentially malicious) content from untrusted raw HTML strings before presenting them to the user.

What does HTML Sanitizer do?

The HTML Sanitizer API allow developers to take untrusted strings of HTML and Document or DocumentFragment objects, and sanitize them for safe insertion into a document's DOM.

Why must you always sanitize user inputs before using them in your queries?

An application receives queries and requests from untrusted sources that might expose the system to malicious attacks. Input sanitization ensures that the entered data conforms to subsystem and security requirements, eliminating unnecessary characters that can pose potential harm.

What is XSS sanitization?

xss-sanitize allows you to accept html from untrusted sources by first filtering it through a white list. The white list filtering is fairly comprehensive, including support for css in style attributes, but there are limitations enumerated below.


2 Answers

Pagedown can run on the server as well as the client.

For sanitizing html on the client, it makes more sense to sanitize on output rather than input. You wouldn't sanitize before sending data to a server, but you might sanitize after recieving data from a server.

Imagine making a web-service call on the client and obtaining data from a third-party service. It could be passed through a sanitizer on the client before being rendered. The user could disable the sanitization on their own computer, but they're only hurting themselves.

It's also useful outside of security reasons just to prevent user input accidentally modifying the formatting of the surrounding page. Such as when typing a html post with a real-time preview (like on StackOverflow).

like image 167
fgb Avatar answered Sep 23 '22 10:09

fgb


I believe there is a little misunderstanding about the purpose and nature of such "sanitizers".

The purpose of a sanitizer (e.g. Angular's ngSanitize) is not to prevent "bad" data from being sent to the server-side. It is rather the other way around: A sanitizer is there to protect the non-malicious user from malicious data (being either a result of a security hole on the server side (yeah, no setup is perfect) or being fetched from other sources (ones that you are not in control of)).

Of course, being a client-side feature, a sanitizer could be bypassed, but (since the sanitizer is there to protect the user (not the server)) bypassing it would only leave the bypasser unprotected (which you can't do anything about, nor shouldn't you care - it's their choice).

Furthermore, sanitizers (can) have another (potentially more important) role: A sanitizer is a tool that helps the developer to better organize their code in a way that it is more easily testable for certain kinds of vulnerabilities (e.g. XSS attacks) and even helps in the actual code auditing for such kind of security holes.

In my opinion, the Angular docs summarize the concept pretty neatly:

Strict Contextual Escaping (SCE) is a mode in which AngularJS requires bindings in certain contexts to result in a value that is marked as safe to use for that context.
[...]
SCE assists in writing code in way that (a) is secure by default and (b) makes auditing for security vulnerabilities such as XSS, clickjacking, etc. a lot easier.

[...]
In a more realistic example, one may be rendering user comments, blog articles, etc. via bindings. (HTML is just one example of a context where rendering user controlled input creates security vulnerabilities.)

For the case of HTML, you might use a library, either on the client side, or on the server side, to sanitize unsafe HTML before binding to the value and rendering it in the document.

How would you ensure that every place that used these types of bindings was bound to a value that was sanitized by your library (or returned as safe for rendering by your server?) How can you ensure that you didn't accidentally delete the line that sanitized the value, or renamed some properties/fields and forgot to update the binding to the sanitized value?

To be secure by default, you want to ensure that any such bindings are disallowed unless you can determine that something explicitly says it's safe to use a value for binding in that context. You can then audit your code (a simple grep would do) to ensure that this is only done for those values that you can easily tell are safe - because they were received from your server, sanitized by your library, etc. You can organize your codebase to help with this - perhaps allowing only the files in a specific directory to do this. Ensuring that the internal API exposed by that code doesn't markup arbitrary values as safe then becomes a more manageable task.

Note 1: Emphasis is mine.
Note 2: Sorry for the lengthy quote, but I consider this to be a very improtant (as much as sensitive) matter and one that is too often misunderstood.

like image 41
gkalpak Avatar answered Sep 23 '22 10:09

gkalpak