Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pass arguments and return values with XML::Twig's handler?

Tags:

perl

xml-twig

my question is: how to pass some arguments to XML:Twig's handler, and how to return the result from the handler.

Here is my code, which hardcoded:

<counter name = "music", report type = "month", stringSet index = 4>.

How to implement this by using arguments $counter_name, $type, $id? and how to return the result of string_list? Thanks (sorry I did not post the xml file here because I have some trouble to do that. anything within < and > are ignored).

use XML::Twig;

sub parse_a_counter {

     my ($twig, $counter) = @_;
     my @report = $counter->children('report[@type="month"]');

     for my $report (@report){

         my @stringSet = $report->children('stringSet[@index=”4”]');
         for my $stringSet (@stringSet){

             my @string_list = $stringSet->children_text('string');
             print @string_list;  #  in fact I want to return this string_list,
                                  #  not just print it.
         }
     }

     $counter->flush; # free the memory of $counter
}

my $roots = { 'counter[@name="music"]' => 1 };

my $handlers = { counter => \&parse_a_counter };

my $twig = new XML::Twig(TwigRoots => $roots,
                         TwigHandlers => $handlers);

$twig->parsefile('counter_test.xml');
like image 258
user389955 Avatar asked Jul 12 '10 08:07

user389955


3 Answers

The easiest, and usual way to pass arguments to handlers is to use closures. That's a big word but a simple concept: you call the handler like this tag => sub { handler( @_, $my_arg) } and $my_arg will be passed to the handler. Achieving Closure has more detailed explanations about the concept.

Below is how I would write the code. I used Getopt::Long for argument processing, and qq{} instead of quotes around strings that contained an XPath expression, to be able to use the quotes in the expression.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

use Getopt::Long;

# set defaults
my $counter_name= 'music';
my $type= 'month';
my $id= 4;

GetOptions ( "name=s" => \$counter_name,
             "type=s" => \$type,
             "id=i"   => \$id,
           ) or die;   

my @results;

my $twig= XML::Twig->new( 
            twig_roots => { qq{counter[\@name="$counter_name"]} 
                             => sub { parse_a_counter( @_, $type, $id, \@results); } } )
                   ->parsefile('counter_test.xml');

print join( "\n", @results), "\n";

sub parse_a_counter {

     my ($twig, $counter, $type, $id, $results) = @_;
     my @report = $counter->children( qq{report[\@type="$type"]});

     for my $report (@report){

         my @stringSet = $report->children( qq{stringSet[\@index="$id"]});
         for my $stringSet (@stringSet){

             my @string_list = $stringSet->children_text('string');
             push @$results, @string_list;
         }
     }

     $counter->purge; # free the memory of $counter
}
like image 197
mirod Avatar answered Oct 17 '22 04:10

mirod


DISCLAIMER: I have not used Twig myself, so this answer might not be idiomatic - it is a generic "how do I keep state in a callback handler" answer.

Three ways of passing information in and out of the handlers are:

ONE. State held in a static location

package TwigState;

my %state = ();
# Pass in a state attribute to get
sub getState { $state{$_[0]} }
 # Pass in a state attribute to set and a value 
sub setState { $state{$_[0]} = $_[1]; }

package main;

sub parse_a_counter { # Better yet, declare all handlers in TwigState
     my ($twig, $element) = @_;
     my $counter = TwigState::getState('counter');
     $counter++;
     TwigState::setState('counter', $counter);
}

TWO. State held in a $t (XML::Twig object) itself in some "state" member

# Ideally, XML::Twig or XML::Parser would have a "context" member 
# to store context and methods to get/set that context. 
# Barring that, simply make one, using a VERY VERY bad design decision
# of treating the object as a hash and just making a key in that hash.
# I'd STRONGLY not recommend doing that and choosing #1 or #3 instead,
# unless there's a ready made context data area in the class.
sub parse_a_counter {
     my ($twig, $element) = @_;
     my $counter = $twig->getContext('counter');
     # BAD: my $counter = $twig->{'_my_context'}->{'counter'};
     $counter++;
     TwigState::setState('counter', $counter);
     $twig->setContext('counter', $counter);
     # BAD: $twig->{'_my_context'}->{'counter'} = $counter;
}

# for using DIY context, better pass it in with constructor:
my $twig = new XML::Twig(TwigRoots    => $roots,
                         TwigHandlers => $handlers
                         _my_context  => {});

THREE. Make the handler a closure and have it keep state that way

like image 1
DVK Avatar answered Oct 17 '22 03:10

DVK


The simplest way is to make __parse_a_counter__ return a sub (ie. closure) and store the results in a global variable. For example:

use strict;
use warnings;
use XML::Twig;

our @results;      # <= put results in here

sub parse_a_counter {
    my ($type, $index) = @_;

    # return closure over type & index
    return sub {
        my ($twig, $counter) = @_;
        my @report = $counter->children( qq{report[\@type="$type"]} );

        for my $report (@report) {
            my @stringSet = $report->children( qq{stringSet[\@index="$index"]} );

            for my $stringSet (@stringSet) {
                my @string_list = $stringSet->children_text( 'string' );
                push @results, \@string_list; 
            }
        }
    };
}

my $roots    = { 'counter[@name="music"]' => 1 };
my $handlers = { counter => parse_a_counter( "month", 4 ) };

my $twig = XML::Twig->new(
    TwigRoots    => $roots,                     
    TwigHandlers => $handlers,
)->parsefile('counter_test.xml');

I tested this with the following XML (which is what I could work out from your example XML & code):

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <counter name="music">
        <report type="week">
            <stringSet index="4">
                <string>music week 4</string>
            </stringSet>
        </report> 
    </counter>
    <counter name="xmusic">
        <report type="month">
            <stringSet index="4">
                <string>xmusic month 4</string>
            </stringSet>
        </report> 
    </counter>
    <counter name="music">
        <report type="month"> 
            <stringSet index="4">
                <string>music month 4 zz</string>
                <string>music month 4 xx</string>
            </stringSet>
        </report>
    </counter>
</root>

And I got back this:

[
    [
        'music month 4 zz',
        'music month 4 xx'
    ]
];

Which is what I was expecting!

like image 1
draegtun Avatar answered Oct 17 '22 03:10

draegtun