Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up the XML DTD validation with PHP?

I am walidating my XML with a DTD file I have locally.

For that, I am doing:

$xml                = $dmsMerrin.'/xml/'.$id.'/conversion.xml';
$dtd                = $dmsMerrin.'/style_files/journalpublishing.dtd';

$dom = new DOMDocument();
@$dom->load($xml);

libxml_use_internal_errors(true);

if (@$dom->validate()) {
    $htmlDTDError .= "<h2>No Errors Found - The tested file is Valid !</h2>";
} 
else {
    $errors = libxml_get_errors();
    $htmlDTDError .= '<h2>Errors Found ('.count($errors).')</h2><ol>';

    foreach ($errors as $error) {
        $htmlDTDError .= '<li>'.$error->message.' on line '.$error->line. '</li>';
    }

    $htmlDTDError .= '</ol>';
    libxml_clear_errors();
}

libxml_use_internal_errors(false);

And this takes about 30sec for an XML with 1600 lines.

Is this a usual time? Should be much faster in my opinion?

As you can see, the DTD I am using is locally on the server.

Any idea? Thank you.

EDIT: By debuging and checking the execution time, I noticed that it takes the same time if my xml has 1600 lines or 150 lines, so the problem is not the xml size.

like image 632
Milos Cuculovic Avatar asked Feb 18 '14 16:02

Milos Cuculovic


1 Answers

And this takes about 30sec for an XML with 1600 lines.

That's an unusually long time, and it's likely due to misconfiguration.

By debuging and checking the execution time, I noticed that it takes the same time if my xml has 1600 lines or 150 lines, so the problem is not the xml size.

For a tool that may provide more diagnostics here, try xmllint --valid. It will show, for example, errors for any DTDs that could not be retrieved.

It's very likely that the extra time is due to fetching resources, such as the DTD, needed to perform validation.

For one of your files, confirm that the URL of the DTD can be retrieved quickly by testing with a tool like curl from the same server. Is it a complex DTD? Does it bring in other files? Especially, make sure that it never refers to resources that would have to be fetched from the web, or with hostnames where DNS resolves slowly.

like image 95
Joe Avatar answered Sep 21 '22 13:09

Joe