Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse an xml file with multiple xml declaration using PHP? (A concatenation of several XML files)

Tags:

php

xml

Format of the xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
 <node>
  <element1></element1>
  <element2></element2>
  <element3></element2>
  <element4></element3>  
</node>
</root>

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
 <node>
  <element1></element1>
  <element2></element2>
  <element3></element2>
  <element4></element3>  
</node>
</root>

and several more xml declarations after. BTW, the file size 500MB. I would like to ask for help how to parse this file without breaking it up into different files using PHP.

Any help would be appreciated. Thank you..

like image 306
Jan Mark Avatar asked Dec 05 '25 13:12

Jan Mark


1 Answers

If you do not want to split the file, you will have to work with it in memory. Given your 500MB file size, this could turn out problematic. Anyway, one option would be to remove the XML Prolog and DocType from all documents and then load the whole thing like this:

$dom = new DOMDocument;
$dom->loadXML(
    sprintf(
        '<?xml version="1.0" encoding="UTF-8"?>%s' .
        '<!DOCTYPE >%s' . 
        '<roots>%s</roots>',
        PHP_EOL, 
        PHP_EOL, 
        str_replace(
            array(
                '<?xml version="1.0" encoding="UTF-8"?>', 
                '<!DOCTYPE >'
            ),
            '',
            file_get_contents('/path/to/your/file.xml')
        )
    )
);

This would make it one huge XML file with just one XML prolog and one DocType (note I am assuming the DocType is the same for all documents in the file). You could then process the file by iterating over the individual root elements.

like image 192
Gordon Avatar answered Dec 08 '25 02:12

Gordon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!