I'm starting a project that will be public facing using asp.net mvc. I know there are about a billion php, python, and ruby html sanitizers out there, but does anyone have some pointers to anything good in .net? What are your experiences with what is out there? I know stackoverflow is a site done in asp.net that allows freeform HTML, what does it use?
In data sanitization, HTML sanitization is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe" and desired.
sanitize-html allows you to specify the tags you want to permit, and the permitted attributes for each of those tags. If a tag is not permitted, the contents of the tag are still kept, except for script , style and textarea tags. The syntax of poorly closed p and img elements is cleaned up.
The Sanitizer interface of the HTML Sanitizer API provides methods to sanitize untrusted strings of HTML, Document and DocumentFragment objects. After sanitization, unwanted elements or attributes are removed, and the returned objects can safely be inserted into a document's DOM.
Sanitizer is used by the views to sanitize potentially dangerous values.
Source: https://github.com/mganss/HtmlSanitizer
A fairly robust sanitizer. It understands and can clean inline styles, but doesn't have a parser that can deal with <style> blocks, so it strips them. It's certainly up to and probably beyond the level that Microsoft's AntiXSS was at, before it was abandoned.
https://blog.stackoverflow.com/2008/06/safe-html-and-xss/
HtmlRuleSanitizer
Based on your question I have the following suggestions:
I faced the same problem and built HtmlRuleSanitizer which is a white listing rule based HTML sanitizer on top of the Html Agility Pack.
there is a c# version here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With