Processing a large xml file with perl

Question

I have an XML file which is about 200MB in size, i wish to extract selected information on a line by line bases.

I have written a script with perl using the module XML::LibXML to parse the file contents in and then loop the contents and extract the information line by line. This is ineffective as it reads in the whole file to memory, but I like LibXML as I can use the XPath locations of the information i require.

Can I get suggestions for ways to make my code more effective.

Through searching i have been made aware of XML::SAX and XML::LibXML::SAX but i cannot find documentation which explains the usage and they don't seem to include any type of XPath addressing structure.

Michael Goldshteyn · Accepted Answer

Have you considered the XML::Twig module, which is much more efficient for large file processing, as it states in the CPAN module description:

NAME

XML::Twig - A perl module for processing huge XML documents in tree mode.

SYNOPSIS

...

It allows minimal resource (CPU and memory) usage by building the tree only for the parts of the documents that need actual processing, through the use of the twig_roots and twig_print_outside_roots options.

...

Onlyjob · Answer

I had some luck with XML::Twig but ended up with XML::LibXML::Reader which is much faster... You may also check XML::LibXML::Pattern if you need to use XPath.

Processing a large xml file with perl

Tags:

xml

perl

libxml2

sax

fir3x

2 Answers

Michael Goldshteyn

Onlyjob

Recent Activity

Donate For Us

Processing a large xml file with perl

Tags:

xml

perl

libxml2

sax

fir3x

2 Answers

Michael Goldshteyn

Onlyjob

Related questions

Recent Activity

Donate For Us