I have to parse large XML files in php, one of them is 6.5 MB and they could be even bigger. The SimpleXML extension as I've read, loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?
Example code: // open the XML file $reader = new XMLReader(); $reader->open('books. xml'); // prepare a DOM document $document = new DOMDocument(); $xpath = new DOMXpath($document); // find the first `book` element node at any depth while ($reader->read() && $reader->localName !==
Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation).
For a large file, you'll want to use a SAX parser rather than a DOM parser.
With a DOM parser it will read in the whole file and load it into an object tree in memory. With a SAX parser, it will read the file sequentially and call your user-defined callback functions to handle the data (start tags, end tags, CDATA, etc.)
With a SAX parser you'll need to maintain state yourself (e.g. what tag you are currently in) which makes it a bit more complicated, but for a large file it will be much more efficient memory wise.
My take on it:
https://github.com/prewk/XmlStreamer
A simple class that will extract all children to the XML root element while streaming the file. Tested on 108 MB XML file from pubmed.com.
class SimpleXmlStreamer extends XmlStreamer { public function processNode($xmlString, $elementName, $nodeIndex) { $xml = simplexml_load_string($xmlString); // Do something with your SimpleXML object return true; } } $streamer = new SimpleXmlStreamer("myLargeXmlFile.xml"); $streamer->parse();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With