Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent Javascript injection attacks within user-generated HTML

I am saving user-submitted HTML (in a database). I must prevent JavaScript injection attacks. The most pernicious I have seen is JavaScript in a style="expression(...)".

In addition to this, a fair amount of valid user content will include special characters and XML constructs, so I'd like to avoid a white-list approach if possible. (Listing every allowable HTML element and attribute).

Examples of JavaScript attack strings:

1.

"Hello, I have a <script>alert("bad!")</script> problem with the <dog> element..." 
"Hi, this <b style="width:expression(alert('bad!'))">dog</b> is black." 

Is there a way to prevent such JavaScript, and leave the rest intact?

The only solution I have so far is to use a regular expression to remove certain patterns. It solves case 1, but not case 2.

The environment is essentially the Microsoft stack:

  • SQL Server 2005
  • C# 3.5 (ASP.NET)
  • JavaScript and jQuery.

I would like the chokepoint to be the ASP.NET layer - anyone can craft a bad HTTP request.

Edit

Thanks for the links, everyone. Assuming that I can define my list (the content will include many mathematical and programming constructs, so a whitelist is going to be very annoying), I still have a question:

What kind of parser will allow me to just remove the "bad" parts? The bad part could be an entire element, but then what about those scripts that reside in the attributes? I can't remove < a hrefs > willy-nilly.

like image 288
Jeff Meatball Yang Avatar asked Jun 02 '09 21:06

Jeff Meatball Yang


People also ask

What is HTML injection attack?

What is HTML Injection. HTML Injection also known as Cross Site Scripting. It is a security vulnerability that allows an attacker to inject HTML code into web pages that are viewed by other users.

How can we prevent HTML injection in Java?

General advices to prevent InjectionApply Input Validation (using "allow list" approach) combined with Output Sanitizing+Escaping on user input/output. If you need to interact with system, try to use API features provided by your technology stack (Java / . Net / PHP...) instead of building command.

Is HTML injection a client-side attack?

Similarities to Cross-site ScriptingHTML injection attacks are purely client-side and just like XSS attacks, they affect the user, not the server. There are two major types of HTML injection: reflected and stored, just like in the case of XSS vulnerabilities.

Is HTML Injection same as XSS?

HTML injection attack is closely related to Cross-site Scripting (XSS). HTML injection uses HTML to deface the page. XSS, as the name implies, injects JavaScript into the page. Both attacks exploit insufficient validation of user input.


1 Answers

You think that's it? Check this out.

Whatever approach you take, you definitely need to use a whitelist. It's the only way to even come close to being safe about what you're allowing on your site.

EDIT:

I'm not familiar with .NET, unfortunately, but you can check out stackoverflow's own battle with XSS (https://blog.stackoverflow.com/2008/06/safe-html-and-xss/) and the code that was written to parse HTML posted on this site: Archive.org link - obviously you might need to change this because your whitelist is bigger, but that should get you started.

like image 54
Paolo Bergantino Avatar answered Sep 19 '22 05:09

Paolo Bergantino