Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parser error : XML declaration allowed only at the start of the document

Tags:

php

xml

I have a xml file which contains multiple declarations like the following

<?xml version="1.0" encoding="UTF-8"?>
<root>
 <node>
  <element1>Stefan</element1>
  <element2>42</element2>
  <element3>Shirt</element3>
  <element4>3000</element4>  
</node>
</root>

<?xml version="1.0" encoding="UTF-8"?>
<root>
 <node>
  <element1>Damon</element1>
  <element2>32</element2>
  <element3>Jeans</element3>
  <element4>4000</element4>  
</node>
</root>

when i tried to load the xml with

$data = simplexml_load_file("testdoc.xml") or die("Error: Cannot create object");

then it gives me the following error

Warning: simplexml_load_file(): testdoc.xml:11: parser error : XML declaration allowed only at the start of the document in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): <?xml version="1.0" encoding="UTF-8"?> in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): ^ in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): testdoc.xml:12: parser error : Extra content at the end of the document in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): <root> in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): ^ in C:\xampp\htdocs\crea\services\testxml.php on line 3
Error: Cannot create object

please let me know how to parse this xml or how to split it into no of xml files so that i can read. The file size is around 1 gb.

like image 394
Sahil Avatar asked Feb 11 '23 22:02

Sahil


2 Answers

The second line

<?xml version="1.0" encoding="UTF-8"?>

needs to be removed. Only 1 xml declaration is a allowed in any file and it must be the first line.

Strictly speaking, you also need to have a single root element (though i've seen lenient parsers). Just wrap the contents with a pseudo tag, such that your file would look like:

<?xml version="1.0" encoding="UTF-8"?>
<metaroot><!-- synthetic unique root, no semantics attached -->
    <root>
        <!-- ... -->
    </root>
    <root>
        <!-- ... -->
    </root>

    <!-- ... -->
</metaroot>

Solution for (very) large files:

Use sed to eliminate offending xml declarations and printf to add a single xml declaration plus a unique root element. A sequence of bash commands follows:

  printf "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<metaroot>\n" >out.xml
  sed '/<\?xml /d' in.xml >>out.xml
  printf "\n</metaroot>\n" >>out.xml

in.xml denotes your original file,out.xml the purged result.

printf prints a single xml declaration and the opening/closing tags. sed is a tool to edit a file line by line performing actions contingent on regex pattern matches. The pattern to match is the start of the xml declaration (<\? xml), the action to perform is to delete that line.

Notes:

  • the backslashes in the commands escape symbols with special semantics at the position where they occur.
  • sed is available for windows/macos too.

Alternate solution

Another option is to split the file into individual well-formed files (taken from this SO answer:

csplit -z -f 'temp' -b 'out%03d.xml' in.xml '/<\?xml /' {*}

which produces files named out000.xml, out001.xml, ... You should know at least the magnitude of the number of individual files that have been processed into your input file to be safe with the autonumbering ( though you could of course take the byte number of the input file as the magnitude, using -b 'out%09d.xml' in the above command).

like image 53
collapsar Avatar answered May 07 '23 11:05

collapsar


This is not valid XML. You will need to use string functions to split it - or to be more exact to read it part by part.

$xmlDeclaration = '<?xml version="1.0" encoding="UTF-8"?>';

$file = new SplFileObject($filename, 'r');
$file->setFlags(SplFileObject::SKIP_EMPTY);
$buffer = '';
foreach ($file as $line) {
  if (FALSE === strpos($line, $xmlDeclaration)) {
    $buffer .= $line; 
  } else {
    outputBuffer($buffer);
    $buffer = $line;
  }
}
outputBuffer($buffer);

function outputBuffer($buffer) {
  if (!empty($buffer)) {
    $dom = new DOMDocument();
    $dom->loadXml($buffer);
    $xpath = new DOMXPath($dom);
    echo $xpath->evaluate('string(//element1)'), "\n";
  }
}

Output:

Stefan
Damon
like image 42
ThW Avatar answered May 07 '23 13:05

ThW