From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer. So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons: <ol> <li> Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class. </li> <li> You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output? </li> </ol> I would love to hear the arguments as to why html escaping on page render is preferred

In addition to what has been written already: <ul> <li>Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "<code>Jack &amp; Jill</code>" to get "Jack & Jill"?</li> <li> You are corrupting your data by default. <ul> <li>When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.</li> <li>Suppose one of the inputs is a URL: <code>http://example.com/?x=1&y=2</code>. You want to parse this URL, and extract the <code>y</code> parameter if it exists. This silently fails, because your URL has been corrupted into <code>http://example.com/?x=1&amp;y=2</code>.</li> </ul> </li> <li>It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.</li> <li>XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.</li> <li> You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you. If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong. </li> <li> Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping. You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped. </li> <li>Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.</li> </ul> I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/

<h3>The original misconception</h3> Do not confuse sanitation of output with validation. While <code><script>alert(1);</script></code> is a perfectly valid username, it definitely must be escaped before showing on the website. And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the <code>View</code> instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates. <h3>About your reasons</h3> Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case. You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point. And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.

HTML/XSS escape on input vs output

Tags:

model-view-controller

xss

From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer.

So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons:

Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class.
You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output?

I would love to hear the arguments as to why html escaping on page render is preferred

838

asked Jun 28 '12 22:06

danny

2 Answers

In addition to what has been written already:

Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "Jack & Jill" to get "Jack & Jill"?
You are corrupting your data by default.
- When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.
- Suppose one of the inputs is a URL: http://example.com/?x=1&y=2. You want to parse this URL, and extract the y parameter if it exists. This silently fails, because your URL has been corrupted into http://example.com/?x=1&y=2.
It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.
XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.
You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you.

If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong.
Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping.

You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped.
Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.

I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/

142

answered Sep 19 '22 16:09

spookylukey

The original misconception

Do not confuse sanitation of output with validation.

While <script>alert(1);</script> is a perfectly valid username, it definitely must be escaped before showing on the website.

And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the View instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates.

About your reasons

Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case.

You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point.

And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.

answered Sep 19 '22 16:09

tereško

Related questions
                            
                                Extjs 4 MVC loading a view from controller
                            
                                Best Web applications framework for Java? [closed]
                            
                                ZF: How to get current url on any page in the view?
                            
                                MVC - is it model to view or controller to view?
                            
                                MVC - Accessing css, image, js files in view folder
                            
                                Observing Changes to a mutable array using KVO vs. NSNotificationCenter
                            
                                JavaFX 8 - Tabpanes and tabs with separate FXML and controllers for each tab
                            
                                Alternative "architectural" approaches to javaScript client code?
                            
                                How can I refactor my database access code outside my MVC project but keep my viewmodels inside?
                            
                                Flask-framework: MVC pattern
                            
                                RESTeasy and Returning to a JSP page with a model
                            
                                is Flask an MVC or MTV? [closed]
                            
                                extend() using underscore vs. backbone [closed]
                            
                                There is no argument given that corresponds to the required formal parameter 'context of GenericRepository<Incident>.GenericRepository(dbContext)
                            
                                MVC: should view talk with model directly?
                            
                                How to implement MVC from scratch in PHP? [closed]
                            
                                How do I call Initialize on a custom MembershipProvider?
                            
                                Controller vs. Model - Need explanation
                            
                                How do you share common methods in different grails controllers?
                            
                                Code architecture of service interface and service impl classes spring

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With