Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing a large xml file with perl

I have an XML file which is about 200MB in size, i wish to extract selected information on a line by line bases.

I have written a script with perl using the module XML::LibXML to parse the file contents in and then loop the contents and extract the information line by line. This is ineffective as it reads in the whole file to memory, but I like LibXML as I can use the XPath locations of the information i require.

Can I get suggestions for ways to make my code more effective.

Through searching i have been made aware of XML::SAX and XML::LibXML::SAX but i cannot find documentation which explains the usage and they don't seem to include any type of XPath addressing structure.

like image 300
fir3x Avatar asked Feb 15 '11 16:02

fir3x


2 Answers

Have you considered the XML::Twig module, which is much more efficient for large file processing, as it states in the CPAN module description:

NAME

XML::Twig - A perl module for processing huge XML documents in tree mode.

SYNOPSIS

...

It allows minimal resource (CPU and memory) usage by building the tree only for the parts of the documents that need actual processing, through the use of the twig_roots and twig_print_outside_roots options.

...

like image 173
Michael Goldshteyn Avatar answered Oct 22 '22 13:10

Michael Goldshteyn


I had some luck with XML::Twig but ended up with XML::LibXML::Reader which is much faster... You may also check XML::LibXML::Pattern if you need to use XPath.

like image 37
Onlyjob Avatar answered Oct 22 '22 14:10

Onlyjob