I need to parse XML files of 40GB in size, and then normalize, and insert to a MySQL database. How much of the file I need to store in the database is not clear, neither do I know the XML structure.
Which parser should I use, and how would you go about doing this?
In PHP, you can read in extreme large XML files with the XMLReader
Docs:
$reader = new XMLReader();
$reader->open($xmlfile);
Extreme large XML files should be stored in a compressed format on disk. At least this makes sense as XML files have a high compression ratio. For example gzipped like large.xml.gz
.
PHP supports that quite well with XMLReader
via the compression wrappersDocs:
$xmlfile = 'compress.zlib://path/to/large.xml.gz';
$reader = new XMLReader();
$reader->open($xmlfile);
The XMLReader
allows you to operate on the current element "only". That means it's forward-only. If you need to keep parser state, you need to build it your own.
I often find it helpful to wrap the basic movements into a set of iterators that know how to operate on XMLReader
like iterating through elements or child-elements only. You find this outlined in Parse XML with PHP and XMLReader.
See as well:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With