In a web project, we use OWASP ESAPI in PHP for output encoding. At some points, we'd like to allow a subset of HTML for little formatting options (for example, <i>
and <b>
), while disallowing all other tags and special characters (so they are entity-encoded using the &...;
syntax).
I see the following possibilities to achieve this:
In particular, I need the following tags and attributes to be white-listed:
<br>
<i>
<b>
<u>
<big>
<small>
<sub>
<sup>
<font color="...">
<ul>
+ <li>
<ol>
+ <li>
Please note that our application is security critical. This means that any method we are going to implement should only accept the tags above (and maybe some more formatting-only tags), everything else has to be entity-encoded properly. That this is true should be easily verifiable without doubt by looking at the (simple) code / explanation of the code. The shorter the code, the easier the reviews are. Fully hand-crafted encoders aren't good for this.
It sounds like what you are actually looking for is HTMLPurifier
http://htmlpurifier.org/
FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With