Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using perl XML::LibXML to deal with XML so slowly

Tags:

xml

perl

The XML file is like this:

<?xml version="1.0" encoding="UTF-8"?>
<resource-data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="resource-data.xsd">
  <class name="AP">
    <attributes>
      <resourceId>00 11 B5 1B 6D 20</resourceId>
      <lastModifyTime>20130107091545</lastModifyTime>
      <dcTime>20130107093019</dcTime>
      <attribute name="NMS_ID" value="DNMS" />
      <attribute name="IP_ADDR" value="10.11.141.111" />
      <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 20" />
    </attributes>
        <attributes>
      <resourceId>00 11 B5 1B 6D 21</resourceId>
      <lastModifyTime>20130107091546</lastModifyTime>
      <dcTime>20130107093019</dcTime>
      <attribute name="NMS_ID" value="DNMS" />
      <attribute name="IP_ADDR" value="10.11.141.112" />
      <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 21" />
    </attributes>
  </class>
</resource-data>

And my code:

#!/usr/bin/perl

use Encode;
use XML::LibXML;
use Data::Dumper;

$parser = new XML::LibXML;
$struct = $parser->parse_file("d:/AP_201301073100_1.xml");

my $file_data = "d:\\ap.txt";
open IN, ">$file_data";

$rootel = $struct->getDocumentElement();
$elname = $rootel->getName();

@kids   = $rootel->getElementsByTagName('attributes');
foreach $child (@kids) {
  @atts = $child->getElementsByTagName('attribute');
  foreach $at (@atts) {
    $va = $at->getAttribute('value');
    print IN encode("gbk", "$va\t");
  }
  print IN encode("gbk", "\n");
}
close(IN);

My question is, if the XML file is only 80MB then then program will be very fast, but when the XML file is much larger the program can then be very slow. Can somebody help me speed this up please?

like image 491
John Avatar asked Dec 26 '22 11:12

John


1 Answers

Using XML::Twig will allow you to process each <attributes> element as it is encountered during parsing, and then discard the XML data that is no longer needed.

This program seems to do what you need.

use strict;
use warnings;

use XML::Twig;
use Encode;

use constant XML_FILE => 'S:/AP_201301073100_1.xml';
use constant OUT_FILE => 'D:/ap.txt';

open my $outfh, '>:encoding(gbk)', OUT_FILE or die $!;

my $twig = XML::Twig->new(twig_handlers => {attributes => \&attributes});
$twig->parsefile('myxml.xml');

sub attributes {
  my ($twig, $atts) = @_;
  my @values = map $_->att('value'), $atts->children('attribute');
  print $outfh join("\t", @values), "\n";
  $twig->purge;
}

output

DNMS  10.11.141.111 00 11 B5 1B 6D 20
DNMS  10.11.141.112 00 11 B5 1B 6D 21
like image 200
Borodin Avatar answered Dec 29 '22 10:12

Borodin