Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML Purifier removes IDs even with $config->set('Attr.EnableID', true);

I'm having a problem with HTML Purifier where it removes IDs on headline elements despite using configuration options to avoid such behavior.

Right now I'm using:

// set up HTML Purifier for user inputs
require_once 'htmlpurifier/library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
$config->set('Attr.EnableID', true);
$config->set('HTML.Trusted', true);

$purifier = new HTMLPurifier($config);

I then feed it a string like:

<h6 id="1843804297">This is a title</h6><h5 id="1979691494">This one too.</h5><h3 id="932393874">I think you see where this is going.</h3>

I have also tried creating whitelisted entries for headlines with IDs to no avail, and even directly manipulating the defaults stored in the $config object.

$config->def->defaults['Attr.EnableID'] = true;

The IDs are important because they are assigned by a PHP script, stored in MySQL, and later picked up by a JS navigation system. They need to be fed in from the user, because often they stay static for subsequent content updates.

like image 342
Barney D. Avatar asked Jan 16 '14 14:01

Barney D.


1 Answers

I believe that's because numeric IDs are invalid in HTML4.

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Try using different IDs or change the Doctype.

like image 176
undefined Avatar answered Oct 23 '22 11:10

undefined