Can I use ActionView::Helpers::SanitizeHelper#sanitize on user-entered text that I plan on showing to other users? E.g., will it properly handle all cases described on this site?
Also, the documentation mentions:
Please note that sanitizing user-provided text does not guarantee that the resulting markup is valid (conforming to a document type) or even well-formed. The output may still contain e.g. unescaped ’<’, ’>’, ’&’ characters and confuse browsers.
What's the best way to handle this? Pass the sanitized text through Hpricot
before displaying?
Sanitizes HTML input, stripping all tags and attributes that aren't whitelisted. It also strips href/src attributes with unsafe protocols like javascript:, while also protecting against attempts to use Unicode, ASCII, and hex character references to work around these protocol filters.
So as you can see, Rails in fact sanitizes it for you, so long as you pass the parameter in as a hash, or method parameter (depending on which query method you're using).
In data sanitization, HTML sanitization is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe" and desired.
h is just alias for html_escape. It is a utility method commonly used to escape html and javascript from user input forms.
Ryan Grove's Sanitize goes a lot farther than Rails 3 sanitize
. It ensures the output HTML is well-formed and has three built-in whitelists:
Sanitize::Config::RESTRICTED Allows only very simple inline formatting markup. No links, images, or block elements.
Sanitize::Config::BASIC Allows a variety of markup including formatting tags, links, and lists. Images and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto protocols, and a attribute is added to all links to mitigate SEO spam.
Sanitize::Config::RELAXED Allows an even wider variety of markup than BASIC, including images and tables. Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images are limited to HTTP and HTTPS. In this mode, is not added to links.
Sanitize is certainly better than the "h" helper. Instead of escaping everything, it actually allows the html tags that you specify. And yes, it does prevent cross-site scripting because it removes javascript from the mix entirely.
In short, both will get the job done. Use "h" when you don't expect anything other than plaintext, and use sanitize when you want to allow some, or you believe people may try to enter it. Even if you disallow all tags with sanitize, it'll "pretty up" the code by removing them instead of escaping them as "h" does.
As for incomplete tags: You could run a validation on the model that passes html-containing fields through hpricot, but I think this is overkill in most applications.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With