Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does LIBXML_NOENT do (and why isn't it called LIBXML_ENT)?

In PHP, one can pass optional arguments to various XML parsers, one of them being LIBXML_NOENT. The documentation has this to say about it:

LIBXML_NOENT (integer)
Substitute entities

Substitute entities isn't very informative (what entities? when are they substituted?). But I think it's fair to assume that NOENT is short for NO_ENTITIES or NO_EXTERNAL_ENTITIES, so to me it seems to be a fair assumption that this flag disables the parsing of (external) entities.

But that is indeed not the case:

$xml = '<!DOCTYPE root [<!ENTITY c PUBLIC "bar" "/etc/passwd">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT);
echo $dom->textContent;

The result is that the content of /etc/passwd is echoed. Without the LIBXML_NOENT argument this is not the case.

For non-external entities, the flag doesn't seem to have any effect. Example:

$xml = '<!DOCTYPE root [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->textContent;

The result of this code is "TEST", with and without LIBXML_NOENT.

The flag doesn't seem to have any effect on pre-defined entities such as &lt;.

So my questions are:

  • What exactly does the LIBXML_NOENT flag do?
  • Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?
  • Is there a flag that actually prevents the parsing of all entities?
like image 974
tim Avatar asked Aug 06 '16 18:08

tim


1 Answers

Q: What exactly does the LIBXML_NOENT flag do?

The flag enables the substitution of XML character entity references, external or not.

Q: Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?

The name is indeed misleading. I think that NOENT simply means that the node tree of the parsed document won't contain any entity nodes, so the parser will substitute entities. Without NOENT, the parser creates DOMEntityReference nodes for entity references.

Q: Is there a flag that actually prevents the parsing of all entities?

LIBXML_NOENT enables the substitution of all entity references. If you don't want entities to be expanded, simply omit the flag. For example

$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->saveXML();

prints

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY c "TEST">
]>
<test>&c;</test>

It seems that textContent replaces entities on its own which might be a peculiarity of the PHP bindings. Without LIBXML_NOENT, it leads to different behavior for internal and external entities because the latter won't be loaded.

like image 91
nwellnhof Avatar answered Sep 25 '22 15:09

nwellnhof