Let's say I have a simple ASP.NET MVC blog application and I want to allow readers to add comments to a blog post. If I want to prevent any type of XSS shenanigans, I could HTML encode all comments so that they become harmless when rendered. However, what if I wanted to some basic functionality like hyperlinks, bolding, italics, etc?
I know that StackOverflow uses the WMD Markdown Editor, which seems like a great choice for what I'm trying to accomplish, if not for the fact that it supports both HTML and Markdown which leaves it open to XSS attacks.
To keep yourself safe from XSS, you must sanitize your input. Your application code should never output data received as input directly to the browser without checking it for malicious code. For more details, refer to the following articles: Preventing XSS Attacks and How to Prevent DOM-based Cross-site Scripting.
Web application firewall. A web application firewall (WAF) can be a powerful tool for protecting against XSS attacks. WAFs can filter bots and other malicious activity that may indicate an attack. Attacks can then be blocked before any script is executed.
Cross-site scripting prevention can generally be achieved via two layers of defense: Encode data on output. Validate input on arrival.
If you are not looking to use an editor you might consider OWASP's AntiSamy.
You can run an example here: http://www.antisamy.net/
How much HTML are you going to support? Just bold/italics/the basic stuff? In that case, you can convert those to markdown syntax and then strip the rest of the HTML.
The stripping needs to be done server side, before you store it. You need to validate the input on the server as well, when checking for SQL-vulnerabilities and other unwanted stuff.
If you need to do it in the browser: http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
I'd suggest you only submit the markdown syntax. On the front end, the client can type markdown and have an HTML preview (same as SO), but only submit the markdown syntax server-side. Then you can validate it, generate the HTML, escape it and store it.
I believe that's the way most of us do it. In either case, markdown is there to alleviate anyone from writing structured HTML code and give power to those who wouldn't even know how to.
If there's something specific you'd like to do with the HTML, then you can tweak it with some CSS inheritance '.comment a { color: #F0F; }', front end JS or just traverse over the generated HTML from parsing markdown before you store it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With