Strict HTML Validation and Filtering in PHP

Question

I'm looking for best practices for performing strict (whitelist) validation/filtering of user-submitted HTML.

Main purpose is to filter out XSS and similar nasties that may be entered via web forms. Secondary purpose is to limit breakage of HTML content entered by non-technical users e.g. via WYSIWYG editor that has an HTML view.

I'm considering using HTML Purifier, or rolling my own by using an HTML DOM parser to go through a process like HTML(dirty)->DOM(dirty)->filter->DOM(clean)->HTML(clean).

Can you describe successes with these or any easier strategies that are also effective? Any pitfalls to watch out for?

Kornel · Accepted Answer

I've tested all exploits I know on HTML Purifier and it did very well. It filters not only HTML, but also CSS and URLs.

Once you narrow elements and attributes to innocent ones, the pitfalls are in attribute content – javascript: pseudo-URLs (IE allows tab characters in protocol name - java	script: still works) and CSS properties that trigger JS.

Parsing of URLs may be tricky, e.g. these are valid: http://spoof.com:xxx@evil.com or //evil.com. Internationalized domains (IDN) can be written in two ways – Unicode and punycode.

Go with HTML Purifier – it has most of these worked out. If you just want to fix broken HTML, then use HTML Tidy (it's available as PHP extension).

Strict HTML Validation and Filtering in PHP

Tags:

html

security

validation

php

xss

Barry Austin

1 Answers

Kornel

Recent Activity

Donate For Us

Strict HTML Validation and Filtering in PHP

Tags:

html

security

validation

php

xss

Barry Austin

1 Answers

Kornel

Related questions

Recent Activity

Donate For Us