Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Markdown XSS Sanitizer

I'm looking for a simple PHP library that helps filter XSS vulnerabilities in PHP Markdown output. I.E. PHP Markdown will parse things such as:

[XSS Vulnerability](javascript:alert('xss'))

I've been doing some reading around and the best I've found on the subject here was this question.

Although HTML Purifier looks like the best (nearly only) solution I was wondering if there was anything out there more general? HTML Purifier seems to be a bit robust especially for my needs, as well as a pain to configure, though it looks like it'd work excellent after doing so.

Is there anything else out there that may be a little less robust and configurable but still do a solid job? Or should I just dig in and start trying to configure HTML Purifier for my needs?

EDIT FOR CLARITY: I'm not looking to cut corners or anything of the like. HTML Purifier just offers a lot of fine grained control and for a simple small project that much control just simply isn't needed, though using nothing isn't an option either. This is where I was coming from when asking for something simpler or less robust.

Also a final note, I'm NOT looking for suggestions to use htmlspecialchars(), strip_tags() or anything of the like. I already disallow imbedded HTML in PHP Markdown Extra by sanitizing it in a similar fashion. I'm looking for ways to prevent XSS vulnerabilities in PHP Markdown OUTPUT.

Thanks.

like image 835
anomareh Avatar asked Jan 18 '10 23:01

anomareh


3 Answers

I've never heard of any other tool than HTML Purifier, to do that -- and HTML Purifier does indeed have a good reputation.

Maybe it's "a bit robust" and "a pain to configure", yes ; but it's also probably the most used, and tested, solution available in PHP ;; and those are important criteria when you have to choose such an important component.

Even if it means investing half a day to configure it properly, if I were in your situation, I would probably choose HTML Purifier.

like image 180
Pascal MARTIN Avatar answered Nov 12 '22 15:11

Pascal MARTIN


There is no such thing as too robust. “Sanitising” HTML is hard. Any corners you cut to process it more simply are likely to result in exploits sneaking through. Even complicated old HTMLPurifier, with its best-of-breed reputation, has had multiple ways of sneaking dangerous markup through in the past!

However, if your text-markup solution is capable of outputting dangerous HTML then it is deficient and should be replaced IMO. If PHP Markdown allows javascript: URLs through then that's a pretty lamentable, basic flaw and I don't think I'd trust it to get anything else right.

like image 22
bobince Avatar answered Nov 12 '22 14:11

bobince


I had a suggestion, and I asked on SO to find out if it would work but unfortunately, it was closed and marked as a duplicate to this question.

My suggestion is modifying markdown's code and allowing only links and image sources to start with http://, https:// or ftp:// which covers all the common protocols required. If the link doesn't start with one of these, then it should be left unchanged in the output.

like image 1
Vivek Ghaisas Avatar answered Nov 12 '22 14:11

Vivek Ghaisas