In PHP, one can pass optional arguments to various XML parsers, one of them being LIBXML_NOENT
. The documentation has this to say about it:
LIBXML_NOENT (integer)
Substitute entities
Substitute entities
isn't very informative (what entities? when are they substituted?). But I think it's fair to assume that NOENT
is short for NO_ENTITIES
or NO_EXTERNAL_ENTITIES
, so to me it seems to be a fair assumption that this flag disables the parsing of (external) entities.
But that is indeed not the case:
$xml = '<!DOCTYPE root [<!ENTITY c PUBLIC "bar" "/etc/passwd">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT);
echo $dom->textContent;
The result is that the content of /etc/passwd is echoed. Without the LIBXML_NOENT
argument this is not the case.
For non-external entities, the flag doesn't seem to have any effect. Example:
$xml = '<!DOCTYPE root [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->textContent;
The result of this code is "TEST", with and without LIBXML_NOENT
.
The flag doesn't seem to have any effect on pre-defined entities such as <
.
So my questions are:
LIBXML_NOENT
flag do?LIBXML_NOENT
? What is it short for, and wouldn't LIBXML_ENT
or LIBXML_PARSE_EXTERNAL_ENTITIES
be a better fit?Q: What exactly does the LIBXML_NOENT flag do?
The flag enables the substitution of XML character entity references, external or not.
Q: Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?
The name is indeed misleading. I think that NOENT
simply means that the node tree of the parsed document won't contain any entity nodes, so the parser will substitute entities. Without NOENT
, the parser creates DOMEntityReference nodes for entity references.
Q: Is there a flag that actually prevents the parsing of all entities?
LIBXML_NOENT
enables the substitution of all entity references. If you don't want entities to be expanded, simply omit the flag. For example
$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->saveXML();
prints
<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY c "TEST">
]>
<test>&c;</test>
It seems that textContent
replaces entities on its own which might be a peculiarity of the PHP bindings. Without LIBXML_NOENT
, it leads to different behavior for internal and external entities because the latter won't be loaded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With