I have a website that allows to enter HTML through a TinyMCE rich editor control. It's purpose is to allow users to format text using HTML. This user entered content is then outputted to other users of the system. However this means someone could insert JavaScript into the HTML in order to perform a XSS attack on other users of the system. What is the best way to filter out JavaScript code from a HTML string? If I perform a Regular Expression check for <code><SCRIPT></code> tags it's a good start, but an evil doer could still attach JavaScript to the <code>onclick</code> attribute of a tag. Is there a fool-proof way to script out all JavaScript code, whilst leaving the rest of the HTML untouched? For my particular implementation, I'm using C#

Peter, I'd like to introduce you to two concepts in security; Blacklisting - Disallow things you know are bad. Whitelisting - Allow things you know are good. While both have their uses, blacklisting is insecure by design. What you are asking, is in fact blacklisting. If there had to be an alternative to <code><script></code> (such as <code><img src="bad" onerror="hack()"/></code>), you won't be able to avoid this issue. Whitelisting, on the other hand, allows you to specify the exact conditions you are allowing. For example, you would have the following rules: <ul> <li>allow only these tags: b, i, u, img</li> <li>allow only these attributes: src, href, style</li> </ul> That is just the theory. In practice, you must parse the HTML accordingly, hence the need of a proper HTML parser.

How to prevent XSS (Cross Site Scripting) whilst allowing HTML input

3 Answers

If you want to allow some HTML but not all, you should use something like OWASP AntiSamy, which allows you to build a whitelisted policy over which tags and attributes you allow.

HTMLPurifier might also be an alternative.

It's of key importance that it is a whitelist approach, as new attributes and events are added to HTML5 all the time, so any blacklisting would fail within short time, and knowing all "bad" attributes is also difficult.

Edit: Oh, and regex is a bit hard to do here. HTML can have lots of different formats. Tags can be unclosed, attributes can start with or without quotes (single or double), you can have line breaks and all kinds of spaces within the tags to name a few issues. I would rely on a welltested library like the ones I mentioned above.

111

answered Sep 25 '22 21:09

Erlend

Microsoft have produced their own anti-XSS library, Microsoft Anti-Cross Site Scripting Library V4.0:

The Microsoft Anti-Cross Site Scripting Library V4.0 (AntiXSS V4.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include:- A customizable safe list for HTML and XML encoding- Performance improvements- Support for Medium Trust ASP.NET applications- HTML Named Entity Support- Invalid Unicode detection- Improved Surrogate Character Support for HTML and XML encoding- LDAP Encoding Improvements- application/x-www-form-urlencoded encoding support

It uses a whitelist approach to strip out potential XSS content.

Here are some relevant links related to AntiXSS:

Anti-Cross Site Scripting Library
Microsoft Anti-Cross Site Scripting Library V4.2 (AntiXSS V4.2)
Microsoft Web Protection Library

answered Oct 17 '22 01:10

Peter Bridger

Peter, I'd like to introduce you to two concepts in security;

Blacklisting - Disallow things you know are bad.

Whitelisting - Allow things you know are good.

While both have their uses, blacklisting is insecure by design.

What you are asking, is in fact blacklisting. If there had to be an alternative to <script> (such as <img src="bad" onerror="hack()"/>), you won't be able to avoid this issue.

Whitelisting, on the other hand, allows you to specify the exact conditions you are allowing.

For example, you would have the following rules:

allow only these tags: b, i, u, img
allow only these attributes: src, href, style

That is just the theory. In practice, you must parse the HTML accordingly, hence the need of a proper HTML parser.

answered Oct 17 '22 00:10

Christian

Related questions
                            
                                Variable parameters in C# Lambda
                            
                                When building POCOs or simple DTOs, can I use structs instead of classes?
                            
                                C# bitwise shift on ushort (UInt16)
                            
                                Case insensitive Deserialization
                            
                                Show Validation Error in UserControl
                            
                                Automatically close messagebox in C#
                            
                                Fast drawing lots of rectangles one at a time in WPF
                            
                                C# Project has auto generated classes, but what auto generated them?
                            
                                Deserializing JSON using C#
                            
                                Parallel.Foreach as fast / slow as normal ForEach
                            
                                How to get file extension from Save file dialog?
                            
                                How do I generate One time passwords (OTP / HOTP)?
                            
                                Integration Testing vs. Unit Testing
                            
                                Check if date range is sequential in c#?
                            
                                Can't find HttpWebRequest.GetResponse() in WP7 Project
                            
                                MVC3 Layout Page, View, RenderPartial and getting script files into the Header (from the partial view)
                            
                                Preventing Exceptions from 3rd party component from crashing the entire application
                            
                                Edit registry key of other user
                            
                                C# Which is the fastest way to take a screen shot?
                            
                                People Counting System

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to prevent XSS (Cross Site Scripting) whilst allowing HTML input

Tags:

javascript

html

c#

asp.net

xss

Peter Bridger

People also ask

3 Answers

Erlend

Peter Bridger

Christian

Recent Activity

Donate For Us