Format of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
<node>
<element1></element1>
<element2></element2>
<element3></element2>
<element4></element3>
</node>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
<node>
<element1></element1>
<element2></element2>
<element3></element2>
<element4></element3>
</node>
</root>
and several more xml declarations after. BTW, the file size 500MB. I would like to ask for help how to parse this file without breaking it up into different files using PHP.
Any help would be appreciated. Thank you..
If you do not want to split the file, you will have to work with it in memory. Given your 500MB file size, this could turn out problematic. Anyway, one option would be to remove the XML Prolog and DocType from all documents and then load the whole thing like this:
$dom = new DOMDocument;
$dom->loadXML(
sprintf(
'<?xml version="1.0" encoding="UTF-8"?>%s' .
'<!DOCTYPE >%s' .
'<roots>%s</roots>',
PHP_EOL,
PHP_EOL,
str_replace(
array(
'<?xml version="1.0" encoding="UTF-8"?>',
'<!DOCTYPE >'
),
'',
file_get_contents('/path/to/your/file.xml')
)
)
);
This would make it one huge XML file with just one XML prolog and one DocType (note I am assuming the DocType is the same for all documents in the file). You could then process the file by iterating over the individual root elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With