Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting PHP's XMLReader to not throw php errors in invalid documents

I'm in the process of writing a parser, and trying to do good error handling with exceptions.

The following sample code:

<?php
$xml = <<<XML
<?xml version="1.0"?>
<rootElem>
XML;

$reader = new XMLReader();
$reader->xml($xml, null, LIBXML_NOERROR | LIBXML_NOWARNING);

$reader->read();

Emits:

PHP Warning:  XMLReader::read(): An Error Occured while reading in /Users/evert/code/xml/errortest.php on line 11
PHP Stack trace:
PHP   1. {main}() /Users/evert/code/xml/errortest.php:0
PHP   2. XMLReader->read() /Users/evert/code/xml/errortest.php:11

The addition of:

libxml_use_internal_errors(true);

Has no effect.

My goal is to check errors later (with libxml_get_errors()), and throw an exception. I feel the only solution is the use of the silence (@) operator, but this seems like a bad idea..

Note that when I don't pass the LIBXML constants, nor use libxml_use_internal_errors, I get a different error, such as:

PHP Warning:  XMLReader::read(): /Users/evert/code/xml/:2: parser error : Extra content at the end of the document in /Users/evert/code/xml/errortest.php on line 11

This suggests that the underlying libxml library is indeed supressing the error, but within XMLReader an error is thrown anyway.

like image 382
Evert Avatar asked Feb 17 '13 23:02

Evert


2 Answers

Looks like there is no way to suppress the warning other than to use @, since php source for read() has following lines:

retval = xmlTextReaderRead(intern->ptr);
if (retval == -1) {
    php_error_docref(NULL TSRMLS_CC, E_WARNING, "An Error Occured while reading");
    RETURN_FALSE;
} else {
    RETURN_BOOL(retval);
}

So, only the actual parsing errors inside xmlTextReaderRead() are being suppressed by the libxml_use_internal_errors(true); or the options passed to XMLReader::xml().

like image 126
lazyhammer Avatar answered Nov 19 '22 13:11

lazyhammer


From my understanding XMLReader, to validate document, have to conduct one full pass through all document.

What I'm doing is:

// Enable internal libxml errors
libxml_use_internal_errors(true);
$xml = new \XMLReader();
$xsd='myfile.xsd';
$xml->open('myfile.xml');
$xml->setSchema ($xsd);

// Conduct full pass through document. The only reason is to force validation.
while (@$xml->read()) { }; // empty loop

if (count(libxml_get_errors ())==0) {
    echo "provided xml is well formed and xsd-valid";
    // Now you can start processing without @ as document was validated against xsd and is xml-wellformed
}
else 
    echo "provided xml is wrong and/or not xsd-valid. stopping";

Of course you can check for the errors inside of the empty loop and then break immediately after first error. I've noticed that XMLReader do not fail completely after first error - it continues and brings array of issues which is useful. It might be useful sometimes to printout all issues found instead of break processing after first problem.

My biggest concern is what for isValid function exist in XMLReader :) I think this is in fact a kind of workaround but it works very well and validating before processing matches 95% of XMLReader use cases as it is used for large xml collections processing.

like image 25
Hubert Muller Avatar answered Nov 19 '22 15:11

Hubert Muller