As per the HTML Purifier smoketest, 'malformed' URIs are occasionally discarded to leave behind an attribute-less anchor tag, e.g.
<a href="javascript:document.location='http://www.google.com/'">XSS</a>
becomes <a>XSS</a>
...as well as occasionally being stripped down to the protocol, e.g.
<a href="http://1113982867/">XSS</a>
becomes <a href="http:/">XSS</a>
While that's unproblematic, per se, it's a bit ugly. Instead of trying to strip these out with regular expressions, I was hoping to use HTML Purifier's own library capabilities / injectors / plug-ins / whathaveyou.
Conditionally removing an attribute in HTMLPurifier is easy. Here the library offers the class HTMLPurifier_AttrTransform
with the method confiscateAttr()
.
While I don't personally use the functionality of confiscateAttr()
, I do use an HTMLPurifier_AttrTransform
as per this thread to add target="_blank"
to all anchors.
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target();
// purify down here
HTMLPurifier_AttrTransform_Target
is a very simple class, of course.
class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// I could call $this->confiscateAttr() here to throw away an
// undesired attribute
$attr['target'] = '_blank';
return $attr;
}
}
That part works like a charm, naturally.
Perhaps I'm not squinting hard enough at HTMLPurifier_TagTransform
, or am looking in the wrong place(s), or generally amn't understanding it, but I can't seem to figure out a way to conditionally remove elements.
Say, something to the effect of:
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addElementHandler('a');
$anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull();
// add target as per 'point of reference' here
// purify down here
With the Cull class extending something that has a confiscateElement()
ability, or comparable, wherein I could check for a missing href
attribute or a href
attribute with the content http:/
.
I understand I could create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest I'd be using regular expressions in that, which I'd really rather avoid, if it is at all possible. I'm hoping for an onboard or quasi-onboard solution that makes use of HTML Purifier's excellent parsing capabilities.
Returning null
in a child-class of HTMLPurifier_AttrTransform
unfortunately doesn't cut it.
Anyone have any smart ideas, or am I stuck with regexes? :)
Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:
// a bit of context
$htmlDef = $this->configuration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
// HTMLPurifier_AttrTransform_RemoveLoneHttp strips 'href="http:/"' from
// all anchor tags (see first post for class detail)
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();
// this is the magic! We're making 'href' a required attribute (note the
// asterisk) - now HTML Purifier removes <a></a>, as well as
// <a href="http:/"></a> after HTMLPurifier_AttrTransform_RemoveLoneHttp
// is through with it!
$htmlDef->addAttribute('a', 'href*', new HTMLPurifier_AttrDef_URI());
It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With