Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to process large XML in PHP [duplicate]

I have to parse large XML files in php, one of them is 6.5 MB and they could be even bigger. The SimpleXML extension as I've read, loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?

like image 327
Petruza Avatar asked Jul 22 '09 17:07

Petruza


People also ask

How to parse large XML files in PHP?

Example code: // open the XML file $reader = new XMLReader(); $reader->open('books. xml'); // prepare a DOM document $document = new DOMDocument(); $xpath = new DOMXpath($document); // find the first `book` element node at any depth while ($reader->read() && $reader->localName !==

Which python module is best suited for parsing XML documents?

Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation).


2 Answers

For a large file, you'll want to use a SAX parser rather than a DOM parser.

With a DOM parser it will read in the whole file and load it into an object tree in memory. With a SAX parser, it will read the file sequentially and call your user-defined callback functions to handle the data (start tags, end tags, CDATA, etc.)

With a SAX parser you'll need to maintain state yourself (e.g. what tag you are currently in) which makes it a bit more complicated, but for a large file it will be much more efficient memory wise.

like image 175
Eric Petroelje Avatar answered Sep 29 '22 00:09

Eric Petroelje


My take on it:

https://github.com/prewk/XmlStreamer

A simple class that will extract all children to the XML root element while streaming the file. Tested on 108 MB XML file from pubmed.com.

class SimpleXmlStreamer extends XmlStreamer {     public function processNode($xmlString, $elementName, $nodeIndex) {         $xml = simplexml_load_string($xmlString);          // Do something with your SimpleXML object          return true;     } }  $streamer = new SimpleXmlStreamer("myLargeXmlFile.xml"); $streamer->parse(); 
like image 45
oskarth Avatar answered Sep 29 '22 00:09

oskarth