Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML validation against given DTD in PHP

In PHP, I am trying to validate an XML document using a DTD specified by my application - not by the externally fetched XML document. The validate method in the DOMDocument class seems to only validate using the DTD specified by the XML document itself, so this will not work.

Can this be done, and how, or do I have to translate my DTD to an XML schema so I can use the schemaValidate method?

(this seems to have been asked in Validate XML using a custom DTD in PHP but without correct answer, since the solution only relies on DTD speicified by the target XML)

like image 531
Allanrbo Avatar asked Aug 13 '09 19:08

Allanrbo


People also ask

How a XML document is validated What is DTD?

An XML document that is well created can be validated using DTD (Document Type Definition) or XSD (XML Schema Definition). A well-formed XML document should have correct syntax and should follow the below rules: It must start with the XML declaration. It must have one unique root element enclosing all the other tags.

How validate XML in PHP?

The DOMDocument::validate() function is an inbuilt function in PHP which is used to validate the document based on its DTD (Document Type Definition). DTD defines the rules or structure to be followed by the XML file and if a XML document doesn't follows this format then this function will return false.

Can we validate XML documents against a schema?

You can validate your XML documents against XML schemas only; validation against DTDs is not supported. However, although you cannot validate against DTDs, you can insert documents that contain a DOCTYPE or that refer to DTDs.


1 Answers

Note: XML validation could be subject to the Billion Laughs attack, and similar DoS vectors.

This essentially does what rojoca mentioned in his comment:

<?php

$xml = <<<END
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo SYSTEM "foo.dtd">
<foo>
    <bar>baz</bar>
</foo>
END;

$root = 'foo';

$old = new DOMDocument;
$old->loadXML($xml);

$creator = new DOMImplementation;
$doctype = $creator->createDocumentType($root, null, 'bar.dtd');
$new = $creator->createDocument(null, null, $doctype);
$new->encoding = "utf-8";

$oldNode = $old->getElementsByTagName($root)->item(0);
$newNode = $new->importNode($oldNode, true);
$new->appendChild($newNode);

$new->validate();

?>

This will validate the document against the bar.dtd.

You can't just call $new->loadXML(), because that would just set the DTD to the original, and the doctype property of a DOMDocument object is read-only, so you have to copy the root node (with everything in it) to a new DOM document.

I only just had a go with this myself, so I'm not entirely sure if this covers everything, but it definitely works for the XML in my example.

Of course, the quick-and-dirty solution would be to first get the XML as a string, search and replace the original DTD by your own DTD and then load it.

like image 66
mercator Avatar answered Nov 15 '22 18:11

mercator