In a web project, we use OWASP ESAPI in PHP for output encoding. At some points, we'd like to allow a subset of HTML for little formatting options (for example, <i> and <b>), while disallowing all other tags and special characters (so they are entity-encoded using the &...; syntax).
I see the following possibilities to achieve this:
In particular, I need the following tags and attributes to be white-listed:
<br><i><b><u><big><small><sub><sup><font color="..."><ul> + <li>
<ol> + <li>
Please note that our application is security critical. This means that any method we are going to implement should only accept the tags above (and maybe some more formatting-only tags), everything else has to be entity-encoded properly. That this is true should be easily verifiable without doubt by looking at the (simple) code / explanation of the code. The shorter the code, the easier the reviews are. Fully hand-crafted encoders aren't good for this.
It sounds like what you are actually looking for is HTMLPurifier
http://htmlpurifier.org/
FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With